Async in-memory SQLite/SQLAlchemy database for FastAPI

Hello friends!

Today I’m presenting the database configuration that I (currently) use on pythondocs.xyz – real time interactive search of Python documentation.

It copies a SQLite database from disk into memory, so it’s very fast. It’s great for read-only workflows – dashboards and the like. It’s not suitable for sites that accept user input, as it makes no attempt to preserve updates to the database.

The config works well for pythondocs.xyz: I generate the site’s database “offline”, with a standalone parser application, and I ship the resulting database file with the web application. When the web app starts up, the database is copied into memory, and you get nice fast database access (even if your queries aren’t super efficient!)

The main dependencies are sqlalchemy, the predominant Python ORM, and aiosqlite, an async replacement for the Standard Library’s sqlite3. I use the database with FastAPI but it should work in other applications.

The database copying is handled by sqlite3‘s backup method. But sqlite3 is a synchronous library, and I want concurrent database access for performance reasons. Luckily, it’s possible to populate the database with sqlite3 and read it from aiosqlite by pointing the two libraries at the same shared memory location.

Without further ado, here’s the code that sets up the database:

from typing import Optional

from sqlalchemy.engine import Engine, create_engine
from sqlalchemy.ext.asyncio import AsyncEngine, AsyncSession, create_async_engine
from sqlalchemy.orm import sessionmaker

SQLITE_SYNC_URL_PREFIX = "sqlite:///"
SQLITE_ASYNC_URL_PREFIX = "sqlite+aiosqlite:///"
MEMORY_LOCATION_START = "file:"
MEMORY_LOCATION_END = "?mode=memory&cache=shared&uri=true"


class InMemoryDatabase:
    """
    Async in-memory SQLite DB
    """

    def __init__(self, sql_echo: bool = False):
        self.sql_echo = sql_echo
        self._sync_memory_engine: Optional[Engine] = None
        self._async_memory_engine: Optional[AsyncEngine] = None
        self._async_sessionmaker: Optional[sessionmaker] = None

    def setup(self, filename: str):
        """
        Copy DB data from disk to memory and setup async session
        """
        sync_disk_engine = create_engine(
            url=SQLITE_SYNC_URL_PREFIX + filename, echo=self.sql_echo
        )
        in_memory_url = MEMORY_LOCATION_START + filename + MEMORY_LOCATION_END
        # Reference to sync in-memory engine remains open
        self._sync_memory_engine = create_engine(
            url=SQLITE_SYNC_URL_PREFIX + in_memory_url, echo=self.sql_echo
        )
        # Use sync engines to copy DB to memory
        backup_db(source_db=sync_disk_engine, target_db=self._sync_memory_engine)
        sync_disk_engine.dispose()
        # Create async engine at same memory location
        self._async_memory_engine = create_async_engine(
            url=SQLITE_ASYNC_URL_PREFIX + in_memory_url, echo=self.sql_echo
        )
        self._async_sessionmaker = sessionmaker(
            self._async_memory_engine, class_=AsyncSession
        )

Compatibility with FastAPI’s dependency injection is provided by this method:

    async def __call__(self) -> AsyncIterator[AsyncSession]:
        """Used by FastAPI Depends"""
        assert self._async_sessionmaker, "No sessionmaker. Run setup() first."
        async with self._async_sessionmaker() as session:
            yield session

(Thank you to the FastAPI Pagination project for inspiration!)

Use with FastAPI looks like this:

from fastapi import Depends, FastAPI
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession

from async_in_memory_db import InMemoryDatabase
from example_data import DB_FILENAME, User

app = FastAPI()
db = InMemoryDatabase()


@app.on_event("startup")
async def setup_db():
    db.setup(DB_FILENAME)


@app.get("/")
async def example_route(session: AsyncSession = Depends(db)) -> list[User]:
    results = await session.execute(select(User))
    return results.scalars().all()

And here’s what you get in your web browser:

JSON response from in-memory DB

Please see the python_async_in_memory_db GitHub repo for the full code, including an example standalone query that doesn’t use FastAPI.

Is this technique useful to you? Can you see any potential pitfalls that I’ve overlooked?

Let me know in the comments below!

Extract Microsoft To Do steps/sub-tasks from your web browser (with Asana import example)

Hey. Are you stuck in the year 2020?

Are you trying backup Microsoft To Do tasks with PowerShell and Microsoft Graph?

Have you become infuriated because the Graph API doesn’t let you export sub-tasks?

Well, the good news is that in 2022 – in the future – Microsoft Graph will add support for what they call a checklistItem resource type, which you could very possibly add to the PowerShell solution.

But for the rest of this blog post, let’s pretend that’s not the case, because there is a neat way of extracting all your Microsoft To Do data from your very own web browser.

So stop washing that orange, pick up the nearest fidget spinner, and let’s get crocing…

Index success

When you open To Do in your web browser, you see all of your lists, tasks, sub-tasks, and so on. If you disconnect from the internet, you can still click through everything. So “the stuff” must be somewhere on your device. But what is the stuff and where does it live?

Microsoft To Do is a single-page application and it stores its data in IndexedDB, a client-side data store built into modern web browsers. Here’s what my To Do database looks like in Chrome’s developer tools:

And here’s an example from my lists, the somewhat-useful “List o list”:

You see that the list has a unique identifier – id. As you might hope, a task has a field called list_id, which links it to its parent. A task has an id of its own, which is referenced by a step (sub-task) as – you guessed it – task_id. It’s IDs all the way and the data we need is clearly there. But how do we get it?

Requesting extraction

Thankfully, Florian Reuschel has written a fantastic code snippet that lets you dump an IndexedDB database to JSON in your dev tools console. I’ve put some step-by-step instructions on GitHub here.

The JSON file on its own serves as a form of backup – all of your data is present, even if it’s not in an immediately useful format.

I’ve also put a Python script on GitHub that converts the JSON file to Asana’s CSV Importer format.

It was a “write once, run once” kind of thing, made for someone who got in touch via my blog (hi Charlie!). It’s not a masterpiece of modular software design and it has to deal with some quirks in Asana’s API. But I think it’s fairly easy to reason with and hopefully can serve as inspiration if you need to do something similar.

Does it even work?

Yes!

From this:

The OG “List o list”

Via this:

Dumping IndexedDB from browser dev console
JSON to Asana CSV conversion in Python

To this:

“List o list” in Asana land, with sub-tasks included

The process relies on undocumented behaviour, so a solution using the updated Graph API would strictly be more correct (but less fun).

However, at the time of writing, the whole thing works suprisingly well, so if you do want to migrate from Microsoft To Do from Asana it might be worth trying. As ever, make sure you test with some dummy data/test accounts before you commit to anything permanent.

The code and instructions are over here on GitHub.

A very simple async Response cache for FastAPI

Hello world! How’s it going out there? Monkey pox scars healing OK? Great.

My most recent project is pythondocs.xyz – real time interactive search of Python documentation.

If you haven’t checked it out yet, please take a look and let me know what you think over here.

The site is built with FastAPI and I wanted to make it as fast as possible. In particular, I wanted the home page to load almost instantly. The home page is constructed from a couple of database queries and I realised I could reduce load times by building the page once then caching it for future visitors.

But how?

Decorated service

Here’s the solution I came up with, with no external dependencies.

To start, some imports:

import asyncio
from functools import wraps

from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates

The main course is where you find the meat:

def cache_response(func):
    """
    Decorator that caches the response of a FastAPI async function.

    Example:
    ```
        app = FastAPI()

        @app.get("/")
        @cache_response
        async def example():
            return {"message": "Hello World"}
    ```
    """
    response = None

    @wraps(func)
    async def wrapper(*args, **kwargs):
        nonlocal response
        if not response:
            response = await func(*args, **kwargs)
        return response

    return wrapper

And what’s for dessert? Oh, my favorite: objects! They’ll come in handy for the examples below…

app = FastAPI()
templates = Jinja2Templates(directory="templates")

What’s going on?

The idea is that you insert the @cache_response decorator between your async function and its FastAPI path operation decorator.

Here’s a simple example:

@app.get("/json/")
@cache_response
async def json_example():
    await asyncio.sleep(2)
    return {"message": "Hello World"}

When someone navigates to /json/, the json_example function is passed into the wrapper as func.

On the first visit, the wrapper awaits func, stores its response for next time, and returns the response.

On subsequent visits, the response is returned directly, func is never called, and whatever expensive operations it contains don’t slow things down.

Template tantrum

The decorator also works with functions that have parameters, like this Jinja template route:

@app.get("/", response_class=HTMLResponse)
@cache_response
async def home_page(request: Request):
    await asyncio.sleep(2)
    return templates.TemplateResponse("hello_world.html", context={"request": request})

But beware that the first response is always stored and reused, even if called with different arguments!

Is this a terrible idea?

I don’t think it’s, like, the worst idea ever. It’s not like writing your Social Security Number on your front door.

But this is computers and there’s a million ways of doing everything, with various strengths and weaknesses.

I think these are the main downsides of this very simple approach:

  • The cached data is stored in-process and so won’t be shared with other workers.
  • There’s no way of clearing the cache (other than restarting Python).
  • A response isn’t cached until the first call completes. Additional requests received before this happens will still trigger your expensive function.

Here are some alternative approaches, which may work better for your case:

  • Cache intermediate steps in your function (rather than the whole thing), perhaps with functools.cache.
  • Use a dedicated caching service, like Redis, or a database.
  • Cache static assets with a Content Delivery Network, like Cloudflare.

Thanks for reading

More Python stuff coming soon.

Let me know if you have any comments on this post or ideas for the future!

The full code for this post is here on GitHub.

Introducing pythondocs.xyz – live search for Python documentation

Winter is long here.

It is so long that I did an accidental software development after False Spring 3 – whoops!

pythondocs.xyz demo

pythondocs.xyz

pythondocs.xyz is a web tool that provides live search results for Python’s official documentation.

Please try it out and let me know what you think!

It’s at “beta” stage, which means it works pretty well but it’s not perfect.

It’s fast and it looks good and the results are… fine.

It did, however, survive the front page of HackerNews without going above 2% CPU usage, which I think is pretty good.

The next big feature will be better search results. In particular: improved prominence of important language features, like built-in functions, and refined full text search and ordering of results.

Here’s the tech stack as it currently stands, for those interested:

  • Parser: Beautiful Soup + Mozilla Bleach
  • Database: in-memory SQLite (aiosqlite) + SQLAlchemy
  • Web server: FastAPI + Uvicorn + Jinja2
  • Front end: Tailwind CSS + htmx + Alpine.js

This is my first big FastAPI project, and over the next few weeks I’ll blog about some of the tricks I used, especially to do with performance.

Python & VS Code: make Black and organize imports work together on save

Hello world. What a stable and positive time to be alive!

But at least we can fix our code editors.

I use Black to format my Python code, and I tell VS Code to organise imports on save.

Here’s the relevant bit of VS Code’s settings.json:

{
    "python.formatting.provider": "black",
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
        "source.organizeImports": true
    }
}

It’s a pretty pleasant experience, mostly. But this “flapping” behaviour when I save has been bugging me for weeks:

(It’s even more annoying when you’re part way down a file and the entire screen jerks up and down.)

Clearly, Black and the organize imports thing are fighting. But what is the organize imports thing? And what’s the fix?

It turns out that VS Code uses the isort library to sort imports, and isort has a Black compatability profile.

As a Poetry user, all I had to do is add these lines to the pyproject.toml file at the root of my project:

[tool.isort]
profile = "black"

Alternatively, create a file called .isort.cfg (note the leading dot) at the root of your project, with this content:

[settings]
profile=black

No more flapping, and at least one small corner of the universe is a little more sane.