Proof Of Concepts

Turning my shower thoughts into reality.

AI-assisted Development on a VPS

I migrated my Linode instance to Hetzner. During the migration, I realized most of the services I'm self-hosting are no longer useful to me. At the same time, AI tooling in my workplace has opened my eyes to the possibility of AI-assisted development. This is a guide to detail what I did to set up Hermes on a Hetzner VPS, and how I plan to use it for vibe-coding and rearchitecting my toy-application, a bill-spitting Telegram application called Nanasplits. # Setting up Hetzner ## Buying a shared server Hetzner currently has the cheapest option for a VPS for its specs. For my current setup, 2 vCPU and 4GB of RAM with 40GB of storage costs me €5.59/month. Selection options to take note of: - Image: Went with Ubuntu 24.04 for a batteries-included experience, but would have gone with something more efficient Fedora next time. - Networking: Public IPv4 is a **must**. [Cloning from Github require IPv4 addresses.](https://github.com/orgs/community/discussions/151477) - SSH Keys: Set up SSH keys and use your terminal to access the VPS. Hetzner's web console is absolute atrocious and pasting from clipboard mangles the paste content. It will be hell to just log in. - Firewall: There are 2 layers of firewall, Hetzner's own and Ubuntu's UFW. We will set up and duplicate the rules on both firewalls; Hetzner's has the benefit of blocking traffic before it reaches your VPS, but UFW is more intuitive to manage. ## User setup Assuming the VPS has started up, `ssh` into the server as the root user. ```bash ssh root@<IP_ADDRESS> ``` Once in, do an update of the system. We shall also install some necessary utilities. ```bash apt update && apt upgrade -y apt install -y curl git ufw unzip ``` ### Create and run Hermes as a non-root user It is generally a good practice to run applications as a non-root user for security reasons. ```bash adduser hermes usermod -aG sudo hermes ``` Copy your SSH public key to the new user's `authorized_keys` file to allow SSH access. ```bash cp ~/.ssh/authorized_keys /home/hermes/.ssh/authorized_keys chown -R hermes:hermes /home/hermes/.ssh/authorized_keys chmod 600 /home/hermes/.ssh/authorized_keys ``` Now, test that you can log in as the new user. ```bash ssh hermes@<IP_ADDRESS> ``` ## Firewall setup ### Tailscale private VPN ## Hermes installation ### Use on telegram ## Public facing web app ### Cloudflare DNS ### Caddy reverse proxy # Setting up a public-facing web service The goal I want to achieve is to be able to guide the development of _**and test**_ new features on Nanasplits with Hermes from my phone, while I'm out and about. The idea goes something like this: 1. dev server is exposed to the public internet and accessed via Telegram 2. Hermes makes changes to the local copy of codebase. 3. Dev version of Telegram mini-app gets hot-reloaded with the new code, and I can test the changes on the spot. 4. Iterate and repeat. Once the feature is ready, I can then merge the code to master. Build is done on the VPS and deployment is just a `systemctl reload` of the `./dist` folder. Essentially the VPS instance is the source of truth while Github is just a mirror. ##

Static Vector Search

An idea popped into my head while checking out this search solution for statically generated sites: [PageFind](https://pagefind.app/). Pagefind works by building an index of your site's content at build time, and then uses that index to perform searches on the client side. Data bandwidth is saved by using an ordered index, so that only a portion of the index needs to be downloaded, based on metadata. This inspired me to try building a static vector search solution. I have also been meaning to try out running gpu accelerated code in the browser. For large sites, parallelization and gpu acceleration will greatly enhance the user experience, if it is possible. ## Sketch There are quite a few Approximate Nearest Neighbor (ANN) algorithms. But the one that seems to be the most popular is HNSW, which is basically a skip-list in higher dimensions. Unfortunately, each layer strictly increases in size, so there is no way to save bandwidth by only downloading a portion of the index with the traditional algorithm. Idea: For each layer, each node in a layer maps to a distinct layer which contains the neighbourhood of the node in the previous layer. This allows us to prune the search space by fixing the size of each layer, akin to a B+ tree, but for higher dimensions. This is also amenable to parallelization, as all the nodes in a layer can be compared to the query in parallel. There are also some build time optimizations that are possible with parallelization, but this will be fleshed out in the future. ### Evaluation I think the main metric would be recall and precision compared to IVF-PQ, followed by latency and bandwidth usage. ## Does it even make sense? Generating the embedding for your search term is already going to require a model or an external call. What is the point of the static vector search, if you need a server to handle the embedding generation? You might as well just do the search on the server side as well, and return the results. Calling a managed external service also doesn't make sense since you will be shipping your API key to the client! The alternative is to run the embedding on the client. The smallest model I have seen is > 600MB, which completely negates the bandwidth optimizations of the chunked index. However, I believe that we will reach a point where on-device models become more ubiquitous, which makes the idea feasible then. We see some direction of this with Apple's [Foundation Models Framework](https://developer.apple.com/documentation/FoundationModels) and Google's [Built-in AI](https://developer.chrome.com/docs/ai/built-in). This shall be an early proof of concept.

TIL

Someone created an agent system that creates targetted resumes for every job application. On one hand, this is not exactly an _**abuse**_ of the system. The human is supposed to review each PDF and application form before sending it out. It benefits the applicant by presenting the most relevant information to the recruiter, and reduces the effort of tailoring the resume for each application. However, someone can easily take this approach and make it completely automated. I think what is likely to happen is, lazy applicants will simply rubber stamp each AI application and flood the ATS. This sits somewhere in the grey area for me.