Home About Projects Articles

Using a WebAssembly based Search in Gatsby with Stork

Recently I've decided to add search functionalities to this blog, at first, I thought to just integrate the standard SaaS solution but where would the fun be in that. Instead, I went down multiple different paths and tried a few libraries until I ended up on using stork-search in my Gatsby website. This allows for a fast WebAssembly search built in Rust to be integrated into my site, with the index being built during the build process to save the client from any extra processing. In this post I'll go over the alternatives I looked at, generating the files needed for the index, creating the components in Gatsby, and making sure it works with your build provider.

Stork is a fairly straightforward way of implementing a full text search engine in a website, the tool contains two parts one of which is a CLI that you can use to build an index and the other part is a JavaScript library that allows for the implementation of a basic user interface. The CLI tool for creating indexes can be installed following this guide, some of the options to install are Homebrew for Mac, Binaries for Ubuntu, and Cargo for other platforms. The UI library is quite easy to install as it's just a basic JS library and some CSS. Check out this link for a basic implementation of Stork that our Gatsby solution will based on.

Other Static Site Searches

Stork isn't the only search available for static sites, some of the others include LunrJS, Algolia Search, and Tinysearch. Here's a quick overview of all these options.

LunrJS

  • Client-Side and JavaScript based.
  • Popular to use as a Client-Side search in Gatsby Sites.
  • More Customizable than Stork.
  • Server-side SaaS search service.
  • Generous free tier but is paid over a certain amount.
  • Widely Used.
  • Easy to Implement.

TinySearch

  • Client-Side with WebAssembly and Rust based.
  • Uses a Bloom Filter
  • Super Lightweight
  • Hard to implement the index builder into CloudFlare Pages build.

I originally started using TinySearch as I wanted to implement something that was WebAssembly based to my site that was fast, I ended up changing to using Stork as it gave a better user experience and still used WebAssembly.

Generating Files for the Index

Stork uses an CLI tool to generate an index, to supply stork with the data it needs for an index we need to create a config.toml file. This file will contain the title, URL of the page, and a path to the content of the page, below is an example config.

[input]
base_directory = "temp"
url_prefix = "/"
files = [
    {path = "post1.md", url = "/post1", title = "Post 1"},
    {path = "post2.md", url = "/post2", title = "Post 2"},
    {path = "post3.md", url = "/post3, title = "Post 3"},
]

We can create this config file and populate markdown files using onPostBuild in the gatsby-node.js file, this function will need to find the pages for search using GraphQL and export it for the index to build. To do this we will need to import the libraries @iarna/toml, child_process, and fs, I ended up exporting the markdown files and toml into a temp directory for the index creation. Check out the example below for a sample implementation that you'll need to customize for your Gatsby setup.

const path = require("path")
const fs = require("fs")
const TOML = require('@iarna/toml')
const cp = require('child_process');
exports.onPostBuild = async ({ graphql }) => {
  // Run the GraphQL query (from example above).
  await graphql(`
  {
    allMarkdownRemark(
      sort: { order: DESC, fields: [frontmatter___date] }
    ) {
      edges {
        node {
          id
          rawMarkdownBody
          frontmatter {
            slug
            template
            title
          }
        }
      }
    }
  }
  `).then(result => {
    const postsPath = "./temp"

    const posts = result.data.allMarkdownRemark.edges.map(({ node }) => node)

    if (!fs.existsSync(postsPath)) fs.mkdirSync(postsPath)
    const files = [];

    posts.map(post => {
      var slug = path.basename(
        post.frontmatter.slug,
        path.extname(post.frontmatter.slug)
      )
      if (slug.length == 0) {
        slug = "index";
      }

      fs.writeFileSync(`${postsPath}/${slug}.md`, post.rawMarkdownBody);

      files.push({
        path: `${slug}.md`,
        url: post.frontmatter.slug,
        title: post.frontmatter.title
      });
    })

    const config = {
      input: {
        base_directory: "temp",
        url_prefix: "",
        files: files
      }
    }
    const tomlConfig = TOML.stringify(config)
    fs.writeFileSync(`${postsPath}/config.toml`, tomlConfig)
    var child = cp.spawn('./tools/stork', ['build', '--input', 'temp/config.toml', '--output', 'public/assets/index.st']);
  })
}

You can now run the build process with gatsby build and then you will see a temp folder created with the content ready to be indexed. To build the index you will need to run the following command that will generate the index and place it in the assets folder of the site.

stork build --input temp/config.toml --output public/assets/index.st

Creating a Gatsby Component

Once we have generated our index, we get to the next stage which is creating a component in our site, I'll go over the basics of implementing a simple search box with a few lines of HTML. You can customize this much more with features like loading index on click or styling, but I won't cover this here. To achieve a basic search, we need to first import the JS library and CSS library which can be done with the following lines of code.

<link rel="stylesheet" href="https://files.stork-search.net/basic.css"/>
<script src="https://files.stork-search.net/stork.js"></script>

This grabs the CSS and JS files from the stork CDN, but if you would prefer to self-host Stork you can check out this link. I prefer the self-hosting approach as you don't need to rely on other sites for hosting, but it does require more work. The next step will be to create the input component which will just need to include the following code, the data-stork attribute for the input will be the name of the index we will create below and the data-stork attribute for the div is the name of the index with -output on the end.

<input data-stork="index"/>
<div data-stork="index-output"></div>

Once we have created this, we will need to trigger the downloading of the index. On this site I download the index on open of the search, but another use could be to do an onClick for the input field. You will need to use window.stork for Gatsby as Stork does not currently have a React component but will hopefully come soon. The register function needs the name of the index which I named index and location of the index which should be assets/index.st

window.stork.register("index", `${window.location.origin}/assets/index.st`);

After adding all of this you should see a basic Stork element that will search based on the index created above.

Integrating with Your CI/CD Process

With the deployment for my site, I am using Cloudflare Pages which uses Ubuntu 16.04 for the CI/CD process. This means I could use one of the prebuilt binaries during the build process available on the Stork site, there are two options to getting the binary either you can download it on every build or store it inside your repository. I choose to keep it in my repository, so I don't need to rely on another server being up during the build process. I added the following code to the gatsby-node.js under the exports.onPostBuild function which we used above to run after the build of the application was completed.

const cp = require('child_process');
var child = cp.spawn('./tools/stork', ['build', '--input', 'temp/config.toml', '--output', 'public/assets/index.st']);

With this and having Cloudflare Pages run the npm run build command the index will be built and deployed with the site when the master branch is updated. With other platforms like GitHub Actions this process can be completed with a pipeline that would allow for a nicer workflow with greater outputs.

Summary

Overall, I found with Stork implemented to my site the search is fast and works well to find articles without having to run any servers or increase running costs. Though the implementation wasn't straight forward, and a native React component would make the implementation much better and more optimized. You can test out Stork on this site by clicking the search button above, if you are reading this on Medium or elsewhere feel free to head over here to check it out.

© Josh Mangiola 2024