aaron.de (EN)

Hugging Face CLI Practical Guide

This guide is based on the Hugging Face CLI from version 0.34.4 onwards. In this version the old syntax huggingface-cli is replaced by the new command hf. I created this cheat sheet to have a concise and clear reference to the Hugging Face CLI. Instead of having to search the official documentation, I can find the most important commands, descriptions and examples here at a glance. What is Hugging Face? Hugging Face is a platform for machine learning. At its core is the Hugging Face Hub, a public and private repository for AI models, datasets and applications (Spaces). Developers can share, download and reuse models there. In addition to the hub, Hugging Face also offers libraries such as transformers, datasets and diffusers that make it easier to use AI models in practice. The hub thus serves both as a marketplace and as an infrastructure for collaborative development. ...

ComfyUI Tutorial

Artificial intelligence has acquired the ability to create detailed and complex images from pure text descriptions. The technological foundation for this comprises deep AI models that function as digital engines for image generation. They translate written concepts into visual data and generate entirely new graphics on that basis. To precisely control the image generation, users need an appropriate user interface. This is where ComfyUI comes into play. ComfyUI is a flexible and powerful graphical interface designed for working with a variety of AI models. Unlike other programs that hide their processes behind simple menus, ComfyUI uses a modular node-based approach. Each step of image generation, from selecting the model to the finished image, is represented as an individual building block. The user visually connects these blocks and thereby constructs the entire workflow themselves. This method offers transparency and control over the entire generation process and enables users to steer the functioning of the underlying AI down to the smallest detail. ...

Analysis of unstructured documents with "Unstructured"

Within this test, the open-source framework unstructured is used to evaluate the extraction process of text from structured documents. The goal is to assess how suitable unstructured is for practical use in AI-based information systems – especially with respect to text extraction, semantic preparation (chunking/tokenization) and subsequent embedding generation for vector-based retrieval systems. Here is an example of a PDF file that was used for analysis with unstructured. pm-partnerschaft-stackitDownload To run the unstructured library, the official Docker image is used. It contains all required dependencies (e.g. Tesseract, Poppler, Python libraries) and allows immediate use without a local Python installation. ...

Digital Shopping List with React & Supabase

I started this project to learn React in a practical way – and not just follow tutorials. I wanted to implement a realistic frontend scenario that includes typical requirements like user authentication, data management, user interactions, and dynamic UI components. Instead of building my own backend, I consciously chose Supabase – a backend-as-a-service platform that is ideal for learning and prototyping purposes. This allowed me to fully focus on the React ecosystem, including routing, state, component structure, and responsive UI. ...

Emotional Music Evaluation with MindsDB and GPT-4 Based on Spotify Data

MindsDB is an open-source platform designed to enable machine learning, time series analysis, and the integration of large language models directly into traditional database workflows. The platform makes AI functionality accessible through simple SQL queries without requiring a separate machine learning infrastructure. In this post we introduce one of the many features of MindsDB: calling a large language model (GPT-4) via a predefined template that is dynamically populated with database values. The goal is to automatically assess the emotional impact of songs from an existing Spotify dataset. Only a portion of MindsDB’s full feature set is used to demonstrate the basic workflow and the interaction between the database and the LLM. ...

Real-time Facial Animation for Metahumans with Live Link Face in Unreal Engine 5

The transmission of facial expressions in real time to digital characters is an important component of modern animation and visualization processes. With Epic Games’ Live Link Face App and Unreal Engine 5, the facial movements of a real person can be precisely transferred to a digital Metahuman character. A prerequisite for this is an iPhone with an integrated TrueDepth camera that is connected via a local network to the computer running Unreal Engine. This tutorial shows how to set up the Live Link Face App and connect it to the engine, how to correctly prepare the Metahuman, and how to finally transmit the facial data live. The goal is to establish a working real-time connection in which the Metahuman moves synchronously with the facial expressions of the real person. ...

Omniverse: Audio2Face Tutorial

Audio2Face is an AI-powered tool within NVIDIA Omniverse specifically designed to generate realistic facial animations based solely on audio. It is part of the Omniverse platform, which provides a real-time collaboration and simulation environment for 3D workflows. Audio2Face uses a neural network to automatically convert spoken language into lively facial expressions and movements. Typically, Audio2Face is used to make characters in games, films, or digital avatars speak without complex keyframe animation. The generated movements can either be used directly or transferred to custom 3D characters, which is particularly interesting for virtual productions, digital twins, or interactive applications. ...

AI-powered Event Agent for Events

In this project, I developed an AI agent that automatically analyzes events from the NRW region and filters them according to personal criteria. The goal was to filter out only those events that are truly relevant – based on an individually defined prompt. This image shows the list of over 350 events taking place on a single day in Düsseldorf. Across all of NRW, there are several thousand events in one day. ...

LLMs Are Not a Cure-All: Practical Test for Music Classification Based on Metadata

The question was whether current Large Language Models (LLMs) such as GPT-4 or DeepSeek are able to automatically and reliably classify music tracks, specifically salsa songs, based on title, artist, lyrics, and metadata into „Salsa Cubana“ or „Salsa Línea“. It was known that the available information (metadata, genre tags, lyrics) is incomplete and partly inconsistent. The test was explicitly designed to determine the practical limits of today’s LLMs in this context. ...

Omniverse Tutorial

What is Omniverse? Omniverse is a platform from NVIDIA that allows you to create, connect, and simulate virtual 3D worlds – all in real time. Omniverse is an open platform for developers, designers, engineers, researchers, and creatives to: Connect 3D applications (e.g. Blender, Maya, Unreal Engine) Collaborate in a single scene – live and simultaneously. Create physically realistic simulations and AI-driven applications. What is Omniverse used for? Design, visualization & simulation of objects such as vehicles in real time. ...