Lutan-Connected Device

WEEK 1

electronic devices:

1.phone, computer
2.resberryPi
3.ESP32S3
4.bluetooth headphones
5.Stylus pen

6.Building elevator button display screen

7.Card reader for pay

8.Voice-activated light

LOL,I am indeed a homebody.

WEEK 2

I carefully reviewed the code and related explanations for the SIMPLE-TCP and HTTP-TO-DEV sections from the last lesson again. I've indeed resolved many of my doubts.

My takeaways are:

I now have a clearer understanding of the differences and relationship between servers and clients, and the differences between TCP and HTTP.

In the last lesson, I thought I learned about two sets of client-server relationships. The `nc` command establishes a TCP server, and the Arduino acts as a client, actively connecting to and sending messages to the server.

The command `python3 -m http.server` establishes an HTTP server, and the browser we use acts as the client to access this server.

Within Linux, it can manage these two servers, for example, allowing the HTTP server to read the messages received by the TCP server.

gateway network/VPN(virtual private network)

WEEK3

2.4GHz!!!for esp32S3

When connecting to Wi-Fi, the ESP32 development board does not support 5G networks; it must use a 2.4G network, which is the IoT network. I only obtained the SSID and password after logging into my home router's administration panel.

I noticed that my home Wi-Fi's 2.4GHz and 5GHz networks use completely different account names and passwords. My Linux system is connected to the 5GHz network, while the MCU is connected to the 2.4GHz network. They can communicate normally, which surprised me, because I wasn't sure if this would work, as the 2.4GHz and 5GHz networks seem to be distinct.

WIFI of respberry-Pi:

I couldn't find the standard network configuration file on my Raspberry Pi. I had initially configured it to connect to my home Wi-Fi when I flashed the Ubuntu Linux system onto the SD card using the Raspberry Pi's dedicated flashing software.

I took the Pi to school and used it there once, changing my phone's hotspot name and password to match my home Wi-Fi to trick the Pi into connecting. This worked, and the Pi connected, but its IP address changed. I didn't pay attention to this at first, but when I got home, I found that the Pi couldn't connect to my home Wi-Fi anymore. I suspect the configuration was messed up; it seems the two Wi-Fi connections with the same name and password but different IP addresses confused it. I had to connect the Pi directly to my home router with an Ethernet cable to get it working again.

I have a personal project that I want to continue working on throughout this semester.

It's an embodied intelligent desktop robot with a function similar to companionship. What I'm most interested in is its complete "perception-cognition-decision-execution" system architecture. Below are some of my plans and outlines.

I think, besides the heavy workload, the knowledge and skills from this course can be perfectly integrated into this project. This is my goal for this semester.

I built a WiFi connection function that allows me to save multiple sets of WiFi usernames and passwords, and then try them one by one using if/elif statements.

//----------打包数据，并通过TCP发送----------
  if (millis() - lastSend > interval) {
    StaticJsonDocument<512> doc;

    doc["type"] = "perception";
    doc["device"] = "Lutan-ESPS3";
    doc["ts"] = millis();

    JsonObject sensors = doc.createNestedObject("sensors");

    JsonObject th = sensors.createNestedObject("temp_humi");
    th["temp"] = temperature;
    th["humi"] = humidity;

    JsonObject pir = sensors.createNestedObject("pir");
    pir["motion"] = PIR_motion;

    JsonObject mic = sensors.createNestedObject("mic");
    mic["audio"] = sample;

    // 序列化并发送（一行一条）
    serializeJson(doc, client);
    client.print("\n");

    lastSend = millis();
  }

I have an I2C temperature and humidity sensor, an I2S digital microphone, and a PIR sensor – these are my three inputs. I package them into a JSON dictionary format and send them via TCP to my Linux system, which is a Raspberry Pi. I like the dictionary format because it allows me to easily access the corresponding values using their keys.

The `nc` command is a barebones TCP pipeline command that can be used to quickly verify whether a connection is successful, but it can only exist as a foreground or background process. This doesn't fit my vision of building a systematic, stable, and independent program.

Furthermore, I will definitely create consumer processes (even an HTTP server) to consume this data later.

Therefore, I need to convert it into a .py script and encapsulate it as a daemon process.

I placed two files in the `project/perception/get_data/` directory: `receive_tcp.py` and `sensor_buffer.py`.

I cannot directly use `return` in the `receive_tcp` script to return the received data dictionary, because TCP is a continuous, always-on channel. If I use `return`, calling this function would only work once and would block the program.

Therefore, a buffer function is needed to store the messages received from the TCP channel, allowing the main program to freely and easily retrieve the data dictionaries from it.

buffer

The buffer contains two functions: one is called in the `receive_tcp` script to update the value dictionary;

the second function is called in `build_perception` to retrieve the latest values and build the `perceptionState`.

receive tcp

Receive the data stream from the MCU, split it based on newline characters, and then write it to the buffer.

build perception state

Retrieve the data dictionary from the buffer and then unpack it.

The trial run was successful.

The camera is not connected to the MCU, but to the Raspberry Pi, so I've set it up as a separate module, but still within the "perception" directory. `get_video` handles camera initialization and frame acquisition, while `buffer` stores simple camera information, not the frames themselves. `drawframe` is simply the drawing function.

`camera_buffer` will be called in `build_perception_state` to construct the `perceptionState`. In the `mjpeg_http` script responsible for the streaming service, the frames obtained from `get_video`, the `perceptionState` (which includes camera information), and the `drawframe` function will all be imported to render the complete visual output.

Added a script for facial recognition, but it only serves the basic `perceptionState`, so it only detects presence and location, not identification.

The HAAR model doesn't perform very well; faces need to be quite close to be recognized. However, it's sufficient for verification purposes, and the process is now working. The recognition model can be upgraded later.

I manually created a structural diagram of the current perception system, and I feel I now have a clearer understanding of the structure and the relationships between its components.

I also discovered a pattern: the `main.py` program only imports scripts that not only have their own functions but also actively import other scripts that only contain their own functions.

Therefore, those scripts containing only self-contained functions seem to act as pure, minimal functional modules, which are first imported into scripts responsible for larger functional blocks and used within them. These larger functional block scripts are then imported by the `main` program.

So I think this structure can be divided into: minimal function modules – complete functional modules (loops/services, etc.) – main program (scheduling & orchestration).

Draw a box in the `drawframe` function.

Different internal and external calls of the perception State:

In `_init_`, all dictionary structures are flattened, so internal calls can easily access elements, such as `state.count`, without needing to use `state['camera']['face']['count']`.

At the same time, to ensure a well-structured output for external use, the complete dictionary hierarchy is reconstructed in the `to_dict` function.

The audio data for "perception" needs to be upgraded. Previously, I only received volume data in 2-second intervals; now I need to receive the complete audio stream.

Similar to the structure and relationship of "sensor buffer & receive TCP," because the audio stream data format is binary, unlike JSON, and the data volume is too large and the transmission frequency is high, a new TCP connection needs to be established for the AUDIO part. 8080 is listened to by SENSOR TCP, and 8081 is the port for the MJPEG HTTP SERVER, so I chose to bind the AUDIO TCP to port 8082.

The AUDIO BUFFER has three functional components: one for writing data from TCP, one for reading by perceptionstate, and one for reading by a future ASR (Automatic Speech Recognition) script.

The original audio data and parameters were also included in the PERCEPTION dictionary.

At this point, the data collection for the perception component is complete. However, we cannot directly proceed to the "cognitive" stage yet. Instead, we should use the audio and video data to perform three pre-recognition steps: face recognition, action recognition, and speech recognition, to determine "who it is," "what action is being performed," and "what was said." This will yield the final, most complete PERCEPTION SNAPSHOT, which can then be directly fed into the COGNITIVE module for the LLM to build a world model.

Currently, this HTTP interface is my dashboard.

Search This Blog

Lutan Jiang