Go back home

Github projects discovery of June 2024

Emilien Jegou, Fri June 07 2024

It's been a while since my last blog post about the GitHub projects that piqued my interest, and there have been many interesting developments all around. I feel it's finally time to compile a new curated list of projects that may not have made the headlines but are here to improve our development experience!

I think it's fair to say that this list will be highly personalized. If you were to ask around about the crazy projects that have come up since the start of the year, you would probably be told about the development of AI or new language models, and it would be a fair answer. Right now, out of the 30 most starred GitHub projects created this year, 21 of them are AI-related, a trend that continues down the list! If I were to pick the most popular projects of the year, this article would inevitably become a “Top AI projects of the year” which is not my goal.

When selecting projects for this article, I instead focus on projects that fit a few criteria:


LaVague, AI web scraping

#python#AI#Scraping

LaVague, meaning "the wave" in French, is a tool designed to simplify web scraping through AI instructions. The popularization of JavaScript and CSS frameworks and the resulting “name mangling” of classes and IDs force developers to use complex and fragile JavaScript or XPath queries to scrap web pages. When a site is modified and one of those query breaks, the user has no choice but to understand the page changes, debug the script, retest it, and sometimes redeploy it. If you have one scraping task, you should be fine, but if you are managing multiple web scrapers on multiple websites, this can quickly turn into a debugging nightmare.

This project remedies this issue by utilizing a language model intelligent enough to understand HTML documents. If a button is modified and changes appearance or placement on the page, the AI should be able to retrieve it with ease. On the few occasions it doesn't, the developer has a much easier time debugging a script that works through flexible instructions instead of rigid document queries.

Here is a demo of the LaVague library using selenium:

python
from lavague.core import  WorldModel, ActionEngine
from lavague.core.agents import WebAgent
from lavague.drivers.selenium import SeleniumDriver

selenium_driver = SeleniumDriver(headless=False)
world_model = WorldModel()
action_engine = ActionEngine(selenium_driver)
agent = WebAgent(world_model, action_engine)
agent.get("https://huggingface.co/docs")
agent.run("Go on the quicktour of PEFT")
LaVague on github

Redka, a redis clone using sqlite

#database#golang

Redka aims to be a reimplementation of Redis using SQLite as a backend, essentially creating a “disk storage” version of Redis. This project could serve as a practical persistent storage system for desktop applications or embedded programs, offering the robustness and lightweight nature of SQLite combined with the schemaless and simple API of Redis. The roadmap also includes plans to add new features such as publish/subscribe and streams types, which could broaden the tool's applicability, for example, by turning it into a viable option for inter-process communication.

One of the most impressive aspects of Redka is that it doesn't fall short performance-wise. While it is obviously still lacking compared to in-memory storage, their benchmark shows that it's only around 4 to 5 times slower when using disk storage.

redka on github

River, safer reverse proxy in rust

#networking#rust#reverse proxy

At the start of the year, Cloudflare open-sourced their in-house networking framework called Pingora with the goal of improving the memory safety of critical applications. Almost immediately after their announcement, River, a reverse proxy application, arrived on the scene with the goal of making Cloudflare's framework more accessible for everyone's use.

Effectively, River aims to become a modern replacement for Nginx, which was released in 2004 and is still largely the most used reverse proxy application in the world. Some of their selling points are:

Although the project is still experimental and some features are still in active development, you can already give it a try in your projects today.

Here is a very basic example of a TOML configuration for TCP forwarding:

toml
[[basic-proxy]]
name = "Example"
listeners = [
    { source = { kind = "Tcp", value = { addr = "0.0.0.0:8000" } } }
]
connector = { proxy_addr = "91.107.223.4:80" }
river on github

Gritql, find and replace on the AST

#linter#rust

Gritql is a declarative query language for searching and modifying code. One of its main uses is to simplify the creation of linter and preprocessing rules for your projects. Instead of relying on complex regex statements that may break depending on your project formatting, Gritql provides you with a way to interact directly with your selected language's AST (Abstract Syntax Tree), allowing you to set rules on its tokens.

It currently supports more than 10 languages and can be seen as an all-encompassing tool that you can learn once and use everywhere! For that reason, I find Gritql to be an extremely valuable tool to learn and one that will stay relevant over time.

This is how you would use it for find and replace

bash
grit apply '`console.log($msg)` => `winston.log($msg)`'

Plenty of examples are provided along with the tool. You can find them all in this repository under .grit. This repository also contains the GitHub actions to start using GritQL in your CI. Below is an example of a linter rule coded with GritQL:

yaml
patterns:
  - name: use_winston
    level: error
    body: |
      `console.log($msg)` => `winston.log($msg)` where {
        $msg <: not within or { `it($_, $_)`, `test($_, $_)`, `describe($_, $_)` }
      }

gritql on github

Iceoryx2, multi OS zero-copy IPC

#IPC#multi-os#rust

This is my latest and favorite find of the (half-)year; I just wish I had found it sooner! Iceoryx2 is an extremely performant IPC library that uses shared memory for communication. It already supports Windows, macOS, and Linux.

If I had to compare it to other IPC solutions on Linux, here is how I would describe its advantages:

Depending on the language you use, you may have to wait for the bindings to be implemented. As of the time I am writing this, it only supports Rust, but there are already plans to make libraries for C, C++, Python, and C#. It is also worth noting that even though it is already a cross-platform library, some features may be missing depending on your OS.

The following is a simple example usage of the library that uses the default configuration; no permissions are configured on the shared memory:

rust
use iceoryx2::prelude::*;

const CYCLE_TIME: core::time::Duration = core::time::Duration::from_secs(1);

fn publish_send_one() -> eyre::Result<()> {
    let service_name = ServiceName::new("My/Funk/ServiceName")?;
    let service = zero_copy::Service::new(&service_name)
        .publish_subscribe()
        .open::<usize>()?;

    let publisher = service.publisher().create()?;
    publisher.loan_uninit()?.write_payload(1234).send()?;
	Iox2::wait(CYCLE_TIME);	 // wait for message to be received

    Ok(())
}

fn subscribe() -> eyre::Result<()> {
    let service_name = ServiceName::new("My/Funk/ServiceName")?;
    let service = zero_copy::Service::new(&service_name)
        .publish_subscribe()
        .open_or_create::<usize>()?;

    let subscriber = service.subscriber().create()?;

    while let Iox2Event::Tick = Iox2::wait(CYCLE_TIME) {
        while let Some(sample) = subscriber.receive()? {
            println!("received: {:?}", *sample);
        }
    }

    Ok(())
}
iceoryx2 on github