This is an experiment in writing a virtual filesystem in Python using fusepy.

The entire code is available on my github page.



What does this application do ?

Aggregates daily, weekly and monthly horoscopes based on your sunsign and/or moonsign from various horoscope websites and makes them accessible under a single directory in a predictable tree like directory structure.

Let’s get started.



How to run it ?

$ mkdir -p /tmp/mnt; python horoscopeFS.py /tmp/mnt/ <sunsign> <moonsign>

# Run the commands below from another terminal

$ cd /tmp/mnt

$ tree /tmp/mnt
/tmp/mnt
|-- Astrosage
|   |-- daily
|   |-- monthly
|   `-- weekly
|-- Astroyogi
|   |-- daily
|   |-- monthly
|   `-- weekly
|-- AstroyogiCareer
|   |-- daily
|   |-- monthly
|   `-- weekly
`-- IndianAstrology2000
    |-- daily
    |-- monthly
    `-- weekly

You can see how the application provides a clean and consistent tree like directory structure for horoscopes aggregated from different websites. You can navigate directories and read files just like you would any other directory or file.



The usual imports

import os
import sys
import bs4
import fuse
import tempfile
import requests
import textwrap
import argparse
  • bs4 - Beautiful Soup for parsing HTML.

  • fuse - fusepy to create a virtual filesystem using FUSE.

  • tempfile - To create temporary files and directories.

  • requests - To fetch HTML contents from a website.

  • textwrap - To format text for display.

  • argparse - To parse command line arguments.



The starting point

Let’s start by parsing the command line arguments and instantiating an object that provides the filesystem functionality.

The user needs to provide a mount point for the filesystem and his/her sunsign and moonsign.

def main(mountpoint, sunsign, moonsign):
    fuse.FUSE(HoroscopeFS(sunsign, moonsign),
              mountpoint,
              nothreads=True,
              foreground=True)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("mountpoint", help="Mount point for the Virtual File System")
    parser.add_argument("sunsign", help="Your sun sign")
    parser.add_argument("moonsign", help="Your moon sign")
    args = parser.parse_args()

    main(args.mountpoint, args.sunsign.lower(), args.moonsign.lower())



The filesystem implementation

The following is the main class that implements the filesystem functionality. Without this, you won’t be able to navigate the filesystem and read data. This might seem like a hard thing to implement, but, as you’ll see as you read further, it is actually pretty simple for the application we’re writing.

Note that I’ve removed a lot of the original code (which you can follow along from my github page) in order to concentrate on things that are relevant at this stage.

class HoroscopeFS(fuse.Operations):
    """Virtual filesystem for aggregating horoscopes from various websites"""

    def __init__(self, sunsign, moonsign):
	pass

    def getattr(self, path, fh=None):
	pass

    def readdir(self, path, fh):
	pass

    def read(self, path, length, offset, fh):
	pass

Now, let’s take a moment to think about what filesystem functionality our application will need to provide to its users.

Firstly, we’ll need to provide filesystem navigation to navigate the tree like directory structure that our application will make visible to its users.

Secondly, we’ll need to provide our filesystem with the ability to list the contents of a directory, i.e, files in a directory.

Lastly, we’ll need to be able to read the contents of files.

You’d be surprised, but the last three class methods above - getattr(), readdir() and read() - are the only ones you’ll need to implement to provide these filesystem functionalities.

getattr() as you might have guessed is used to get directory/file attributes like permissions, size, etc.

readdir() is used to list the contents of a directory, i.e, list the files inside a directory.

read() is used to read the contents of a file.

Now, lets walk through how you might actually implement these.


def __init__(self, sunsign, moonsign):
    # Get default stats for an empty directory and empty file.
    # The temporary directory and file are automatically deleted.
    with tempfile.TemporaryDirectory() as tmp_dir:
	self.stat_dict_dir = \
		self._convert_stat_to_dict(os.lstat(tmp_dir))

    with tempfile.NamedTemporaryFile() as tmp_file:
	self.stat_dict_file = \
		self._convert_stat_to_dict(os.lstat(tmp_file.name))

    self.sunsign = sunsign
    self.moonsign = moonsign
    self.dot_dirs = ['.', '..']
    self.current_module = sys.modules[__name__]
    self.horoscope_objs = {}

The tempfile module is used to create a temporary directory and a temporary file in order to get their stats which can then be replicated for all the other directories and files in our virtual filesystem with modifications wherever required.

self.dot_dirs stores the default current and parent directories that all directories have so that it can be used when a directories contents are listed later on.

self.horoscope_objs contains one object per horoscope website that we’re aggregating from. It is currently empty and will be instantiated on-demand, which is to say an object will be created for a particular website when the user navigates into the top level directory corresponding to that website.




def getattr(self, path, fh=None):
    if any(map(path.endswith, horoscope_sites)):
	# For directories corresponding to the horoscope websites,
	# return the default stats for directory.
	return self.stat_dict_dir
    elif any(map(path.endswith, horoscope_types)):
	# Fetch content from the horoscope site we are looking at on-demand.
	self._construct_obj_from_path(path)

	# For files corresponding to the horoscope types,
	# return the stats for the file with st_size set appropriately.
	stat = dict(self.stat_dict_file) # Create a copy before modifying
	stat['st_size'] = self._get_file_size_from_path(path)
	return stat
    else:
	# For all other files/directories, return the stats from the OS.
	return self._convert_stat_to_dict(os.lstat(path))

The getattr() method is responsible for returning a dictionary of directory/file attributes.

We know that any directory in our virtual filesystem ends with the name of the horoscope website. We use this information to return the attributes of the temporary directory which we had created during __init__() without any modifications whatsoever.

For files, we need to do a bit more work. The attributes of a file depend on its contents (size of the file, for example). So, we will first need to fetch the contents (horoscope) of the file from the horoscope website before we can decide what the file’s attributes should be. This is exactly what the _construct_obj_from_path() method does. Given the full path to the file, it infers the horoscope website from the path and instantiates an object of the class corresponding to that horoscope website on-demand. The act of instantiating an object fetches the horoscope from the horoscope website and stores it for future use (we’ll see how to fetch the contents from the website later in this post). Note that an object is instantiated only if an object for that horoscope website doesn’t already exist. Once the contents are fetched and stored locally, we can calculate the file size depending on the size of the fetched contents. This is exactly what _get_file_size_from_path() does. We then override only the size of the temporary file we created during __init__() and return the file’s attributes.




def readdir(self, path, fh):
    if any(map(path.endswith, horoscope_sites)):
	# Each horoscope website directory contains one file for each
	# horoscope type.
	return self.dot_dirs + horoscope_types
    else:
	# Top level directory (mountpoint) contains one directory for each
	# horoscope website.
	return self.dot_dirs + horoscope_sites

The readdir() method is used to list the contents of a directory.

If the directory’s name ends with the horoscope website, then we already know what files it should contain - The files corresponding to the “daily”, “weekly” and monthly horoscopes.

If the directory’s name is something else (in our case, it can only be the mount point because there are no other directories in our application), then, we return the names of the directories corresponding to the horoscope websites.

Ofcourse, we have to always return the two special directories (‘.’ and ‘..’) corresponding to the current and parent directories.




def read(self, path, length, offset, fh):
    return self._read_data_from_path(path, length, offset)

read() is used to read the contents of a file (the actual horoscope). The _read_data_from_path() method simply return the contents that were fetched from the horoscope website when an object was instantiated for that website in _construct_obj_from_path().




That wasn’t too difficult !!!. Now all that remains to be seen is how to fetch the horoscope from the horoscope website.



The business logic

Lets walk through fetching contents from one horoscope website. All others should work similarly.

Like I mentioned previously, each horoscope website is a class on its own. Contents are fetched from the website when an object of this class is instantiated. As long as the application is running, only one object of each class can be instantiated. Objects are instantiated on-demand i.e, when the user actually tries to list or read the contents of the directory corresponding to the horoscope website.

The name of the directory corresponding to the horoscope website is same as the name of the class corresponding to that horoscope website.

class Req(object):
    """Get HTML page using requests and parse it using BeautifulSoup"""

    def __init__(self):
        super().__init__()


    def _get(self, url, timeout=30):
        try:
            response = requests.get(url, timeout=timeout)
            response.raise_for_status()
            return bs4.BeautifulSoup(response.text, "html.parser")
        except:
            return None


class Astrosage(Req):
    """Horoscopes from www.astrosage.com"""

    def __init__(self, sunsign, moonsign):
        super().__init__()

        base_url = "http://www.astrosage.com/horoscope/"
        self.horoscope = {}
        for horoscope_type in horoscope_types:
            url = "{}/{}-{}-horoscope.asp"
            url = url.format(base_url, horoscope_type, moonsign)
            self.horoscope[horoscope_type] = self._parse_html(url, horoscope_type)


    def _parse_html(self, url, horoscope_type):
        soup = self._get(url)
        if soup:
            if horoscope_type == "daily":
                html_class_attr = "ui-large-content-box"
            else:
                html_class_attr = "ui-sign-content-box"
            content = soup.find(class_=html_class_attr).text
            content = textwrap.fill(content.strip()) + "\n"
            return content.encode()
        else:
            return NA

__init__() is called when we want to instantiate an object for a horoscope website. This is when we have to fetch the contents (horoscope) from the horoscope website and parse the HTML returned. This is exactly what the _get() method inherited from the Req class does. It uses the requests module to get HTML from the horoscope website and then runs Beautiful Soup to parse the returned HTML. Finally, in _parse_html(), we search for the specific HTML tag that contains the horoscope contents we’re looking for and extract only that bit which is then passed through the textwrap module’s fill() method to restrict the output to 80 columns so that it is easier to read on the terminal. This is the content that the read() method of the HoroscopeFS class will eventually get and cause it to be displayed to the user.




That’s it !!!. We’ve now written a simple virtual filesystem in Python using FUSE. There are a lot of other filesystem functionalities that we didn’t need to use/override for our application, but, if you are ever in need of any of them for your own application, I’d recommend looking at the fusepy examples section.