Mycroft

Resources

Introduction

There are currently two major commercial systems bringing "artificial intelligence" into the home: Google Home (via Google Activities) and Amazon Alexia. I put "artificial intelligence" in quotes as the principal AI component seems to be speech recognition, and in Google's case at least, the ability to deliver information based on its "big data" processing of the Web.

Be that as it may, Mycroft is a reasonably open source system playing in the same space. That is, it listens to talk in the home and when a trigger phrase is encountered, will attempt to perform actions based on the phrases spoken next.

A Mycroft client may be downloaded to a Linux system and built. It contains a core and and extensible set of skills. The skills perform tasks such as responding to "what is the time?" (which can be answered locally) or "what is the weather?" (which will invoke a Web request). These skills are written in Python and follow a simple structure discussed by example later.

Architecture of Mycroft

The document Mycroft core overview discusses the toplevel view of the Mycroft architecture. It gives the normal path of a query as

The user says the wakeword ("Hey Mycroft" by default). This is recognised locally - nothing is sent over the network until this is heard
The speech client records the subsequent spoken audio and send it to the Mycroft backend (by default) to perform speech to text
The speech to text service sends back the transcription, which is sent on the messagebus by the speech client
The skill client catches the message and uses Adapt to match it with an intent registered by one of the skills
If it succeeds, the intent is sent on the messagebus along with what keywords it found
The skills service picks up the message and gives it to the skill matching the intent
The skill processes the message, performs any appropriate action, and sends any message to be spoken on the messagebus
The speech client takes the message and sends it to a text to speech service (Mimic by default)

Working with the virtual environment

Mycroft runs within a Python virtual environment. This is done to esnure that the correct runtime environment is there - it is all stored in the virtual environment. This avoids problems with missing packages, or packages upgraded in incompatable ways. It also ensures that python resolves to python2 and not python3!

But now if you want to use a package that is not in the virtual environment, then you have to install it there, which is a bit more complicated than just using pip to install a package into the normal environment.

To install a package such as fuzzywuzzy, run

	
	  $source venv-activate.sh
	  Entering mycroft-core virtual environment.  Run 'deactivate' to exit
	  (.venv) Desktop:$pip install fuzzywuzzy
	  Collecting fuzzywuzzy
	  Using cached fuzzywuzzy-0.15.1-py2.py3-none-any.whl
	  Installing collected packages: fuzzywuzzy
	  Successfully installed fuzzywuzzy-0.15.1

Leave the virtual environment by

	
	  deactivate

Building the documentation

The command start.sh has a possible parameter of sdkdoc. Running this fails as the Python module pdoc is not in the virtual environment. You need to install it as in the last section by starting the virtual environment and using pip to install pdoc. Then you can run


	  start.sh sdkdoc

The resultant HTML files are in the subdierctory build/doc/mycroft-skills-sdk/html/mycroft/.

Currently the documentation isn't really worth looking at

Setting location

The default configuration shows a location as Lawrence in Kansas. However, setting the location in the config file probably isn't the best waty of doing of it: when Mycroft starts, it queries the Mycroft web backend for its location. Easier is to login in to home.mycroft.ai, select devices and then your chosen device. This allows the location to be set.

Case study: talking to Logitech's Harmony Hub

For anyone with a multitude of devices in the home, keeping track of remotes is a nightmare. Universal remotes replace these with a single remote controller which emulates these other ones. But they still control only a single device.

I have a set of linked devices. When I want to watch something on my Blu-Ray device, I have to turn it on, turn on and set the TV to the correct HDMI input, turn on and set the amplifier to the Blu-Ray input. Such as set of events for a single activity is common. The Logitech Harmony series of devices address this issue by allowing a set of devices to be configured under the banner of an activity.

I have used the Harmony One for many years as an activity-based remote. Tedious to set up, but a joy to use afterwards. The Harmony Hub is the latest model, but its remote now talks to a Hub device which is connected to your network by WiFi and uses IR to talk to your devices. The network capability brings it into the world of the IoT.

Interacting with the Harmony Hub

Logitech has worked with Google to develop an Activity so that Google Home can recognise and transfer commands related to the Hub such as


	  Okay Google, ask Harmony to watch Blu-Ray

Google Home is great, but I can't see the details of the activity, so I'm relying on Logitech and Google to keep that interaction working.

There is also an IFTTT stanza to talk to the Harmony Hub. IFTTT is great, but I can't see the details of the stanza, so I'm relying on Logitech and IFTTT to keep that interaction working.

These are external services not under my control to manipulate a device within my own home. Not completely satisfactory. Fortunately there has been an open source effort to decode the protocol used by the Harmony Hub. The latest version of this is a Java package at tuck182/harmony-java-client: Java client for communicating with a Harmony Hub .

Turning off debugging of the Java Hub client

The Java Hub client makes use of a debugging package which is great for ... debugging. But it gets in the way when trying to interact with it programmatically, as both results and debug output appear mixed together on stdout.

Debugging is done using the slf4j debugging framework which is implemented by a debugging library such Log4J. Debugging is nice most of the time, but it is sometimes nice to be without it. Unfortunately, there isn't a simple way to turn it off.

One way that works for me is to include the file slf4j-nop.jar in the classpath, which is a "no-op" debugger. But if you have jar files included using the -jar option then the classpath is ignored. Also, there is a default debugger implementation included in the Mycroft jar file. So first get rid of that default by deleting it from the jar file:

	
	  zip -d  harmony-java-client-master-1.2.1-all.jar /org/slf4j/impl/StaticLoggerBinder.class

Then you can start the Java Hub client using the classpath option by

	
	  java -cp .../slf4j-nop.jar:.../harmony-java-client-master/build/libs/harmony-java-client-master-1.2.1-all-nodebug.jar net.whistlingfish.harmony.Main HUB_IP

where the '...' are the relative or absolute paths to the jar files and HUB_IP is the Ip address of the Harmony Hub.

Using the Java Hub client

The Java client reads commands in simple English from stdin and write responses to stdout. It always starts by telling you the current activity according to the Hub.

Request	Response
	activity changed: [28199547] Listen to Radio
list activities	28199546: Play Game 28199556: Chromecast 28199554: External PC -1: PowerOff 28238778: Watch Blu-Ray 28199547: Listen to Radio

Setting up a Mycroft Harmony Hub client

Any skill requires at least one Python program to execute commands, called intents. It also requires a directory vocab/en-us listing the phrases (in US english) to trigger an intent, and a directory dialog/en-us for reponses (in US English). The language can of course be changed.

The file structure for the Mycroft Harmony Hub client is


./__init__.py

./vocab/en-us/StartActivityKeywords.voc
./vocab/en-us/ListDevicesKeywords.voc
./vocab/en-us/ListActivitiesKeywords.voc
./vocab/en-us/ShowActivityKeywords.voc

./dialog/en-us/start.activity.dialog
./dialog/en-us/list.devices.dialog
./dialog/en-us/list.activities.dialog
./dialog/en-us/show.activity.dialog

The file StartActivityKeywords.voc contains the lines


	  harmony start
	  harmony begin

which are the triggers for the "start activity" intent. The file start.activity.dialog contains the response for successful invocation and contains


	  Starting activity {{activity}}

where {{activity}} will be a variable substitution set by the Python code.

Python code for the list activities intent

the file __init__.py contains code to handle each intent. Corresponding to the dialogs and vocabularies listed earlier there are the intents


	  StartActivityIntent
	  ListDevicesIntent
	  ListActivitiesIntent
	  ShowActivityIntent

Each intent is associated to the vocabulary that triggers it by


	  IntentBuilder("ListActivitiesIntent").\
                        require("ListActivitiesKeywords").build()

for example. There are several ways of associating code to this intent, and now Mycroft seems to be favouring using a "decorator" mechanism which is much less readable than earlier ones. The following links the method list_activities_intent to the ListActivitiesIntent handler.


   @intent_handler(IntentBuilder("ListActivitiesIntent").\
                                require("ListActivitiesKeywords").build())
    def list_activities_intent(self, message):
        """List all activities Harmony knows about"""

Now we can turn to the code to process the intent. We use a helper method (given later) send_command which writes a string to the Java Hub client and returns a string of the response. Every response starts with the current activity, so for all the intent methods lose this first line, and return an array of lines by


	  result = self.send_command('list activities')
          # lose first line "activity changed..."
          activities = result.split("\n", -1)[1:]

The activities are listed in the form id: name, so now it is a matter of working through the list and showing the name. The output is given by calling the inherited speak_dialog method which takes a list of variable/values patterns to be interpolated into the dialog (which contains the pattern {{activity}}). The complete code for the ListActivitiesIntent is


    @intent_handler(IntentBuilder("ListActivitiesIntent").\
                                require("ListActivitiesKeywords").build())
    def list_activities_intent(self, message):
        """List all activities Harmony knows about"""
        
        LOGGER.debug("Harmony: list activities")
        result = self.send_command('list activities')
        # lose first line "activity changed..."
        activities = result.split("\n", -1)[1:]

        for activity in activities:
            activity_name_loc = string.find(activity, ":")
            # ignore non ':' lines
            if activity_name_loc >= 0:
                activity_name = activity
                report = {"activity": activity_name}
                self.speak_dialog("list.activities", report)

Python code for the start activity intent

An intent such as "list activities" doesn't take any parameters. An intent such as "start Blu-Ray" does have a parameter, and the speech recognition engine will need to get that right. The Java Hub client on my home system will recognise "Blu-Ray", but not "Blu Ray", "blu-ray", "blue ray" or any of the other possibilities. It will however, recognise "28238778" as that is the id belonging to that activity. Something has to disambiguate the recognised speech to an unambiguous form that the device handler will recognise.

Perhaps the Harmony Hub recognises all the different forms that could be sent by IFTTT or by Google Home: without access to the code we don't know. We can change the code of the open source Java Hub client, but for now it is just easier to disambiguate the text in the Python intent, and send the unambiguous id. So we do pattern matching in the Python intent.

There are several pattern matching engines available as Python packages. I chose the fuzzywuzzy package. I use this to take an array of activity strings of the form "id: name", finds the best match of the name against an inut pattern, and returns the id:


    def best_match(self, str, options):
        """Return best fuzzy match to a string from a list of strings
        """
        
        max_match = 0
        option_match = ""
        for option in options:
            colon_at = string.find(option, ':')
            if colon_at >= 0:
                option_name = option[colon_at+2 : ]
                fuzz_value = fuzz.ratio(str, option_name)
                if fuzz_value > max_match:
                    max_match = fuzz_value
                    option_match = option
        return option_match

That gives us the best id match, once we have isolated the string to match against. Here it gets a bit obscure and undocumented. Each intent handler is passed a message parameter. This is of type mycoft.messagebus.Message and has a number of fields, including a dictionary. The dictionary key utterance contains the spoken string such as "harmony start blue ray". There are (human) language dependencies here: a French version might be "harmomy commencer[sic] bleu[sic] ray", while a Chinese version might be "harmony 开始 blue ray". (No, sorry, I don't have multilingual versions at present, but that is what they could look like if I did.). So to extract the payload of the key string, we have to discard the possibly language dependent initial phrase. This is contained in the dictionary entry StartActivityKeywords.

Putting that all together gives us


    @intent_handler(IntentBuilder("StartActivityIntent").\
                                require("StartActivityKeywords").build())
    def start_activity_intent(self, message):
        """Start a Harmony activity

        The message has the activity, but maybe not quite in
        the form required by the Harmony controller.
        So get the accepted list, match against it and invoke
        the best match
        """
        
        LOGGER.debug("Harmon: start activity" + message.data.get('utterance'))

        key = str(message.data.get(u'StartActivityKeywords'))
        utterance = str(message.data.get(u'utterance'))
        payload = string.replace(utterance, key, "")
        payload = string.strip(payload)
        LOGGER.debug("Harmony: utterance: " + utterance + " key " + key + " payload " +  payload)

        # Java interface to Harmony is case sensitive
        # and we are best off using the activity id rather than name
        # so first we have to get the activities with id and name
        result = self.send_command('list activities')
        # lose first line "activity changed..."
        activities = result.split("\n", -1)[1:]

        # and find the best match
        activity = self.best_match(payload, activities)

        # activity is of the form "id: name", get the id
        activity_name_loc = string.find(activity, ":")
        id = activity[0 : activity_name_loc]
        activity_name =  activity[activity_name_loc+2 : ]
        LOGGER.debug("Harmony: id: " + str(id))
        result = self.send_command('start ' + id)

        # say what is happening
        report = {"activity": activity_name}
        self.speak_dialog("start.activity", report)

send_command method

This method sends a mesasge to the Java Hub client and returns the string response


    def send_command(self, command):
        """Send a command to the Java Harmony controller

        Returns a string of the response from the Java process
        """
        
        p = Popen(["java", "-cp",
                   JAVA_CLASSPATH,
                   "net.whistlingfish.harmony.Main",
                   HARMONY_HOST_IP], stdin=PIPE, stdout=PIPE, bufsize=1)

        # communicate returns a tuple of one string element
        (response,) = p.communicate(command)[0], # signal the child to exit,
                                                 # read the rest of the output, 
                                                 # wait for the child to exit
        return response

Mycroft using the Google AIY Voicehat on the RPi 3B+

Running Mycroft on yuor PC or laptop is one way to go. Buying the cool looking Picroft is another way. A third way is to build your own device in some other container. A constraint is that you will need a good, small microphone that can be built into your Mycroft "container". Two choices are the Matrix Voice

and the Google AIY Voicehat

as part of the Google AIY voice kit

I bought the AIY voice kit, but it was a hassle connecting it up to all the Google services, and anyway, I wanted to play Mycroft. Now there is an image that will run on the RPi3B+: follow the instructions in HACKING.md at the aiyprojects-raspbian site. This image has drivers for the Google AIY Voicehat.

Then you can download the latest version of Mycroft for Linux and build that in the normal way for Mycroft. It takes time to build, but the resultant Mycroft system runs fine on the Voicehat.

An alternative, of adding the Voicehat drivers to Picroft is not feasible for me at the moment as the only spare RPi's that I have are all model 3B+ and as of Sept 2, 2018 this model is not supported. The HACKING.md page contains instructions about the drivers needed when it becomes possible to run Picroft on the 3B+