-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio quality > 16 kHz #52
Comments
The issue is that the external/experimental esphome_audio component needed to get esp-idf working with a media player (rather than just a speaker) has some duplex limitations. Specifically, input and output get the same sample rate. There is a resampler, but for whatever reason it doesn't seem to work with the input (mic). So that means the sample rate has to be defined as exactly what the mic can handle. Which is 16kHz. You can try it yourself! Set the rate to 48000 (lines 147, 160) and the mic will stop working. Different values for input and output won't work. Add the resampler to the input (see lines 181, 171) and the mic will still not work. This is the only setting that currently works. Hopefully they get the resampler working with input, then you'll be able to set 48000 or whatever. IMO, the microwakeword version is not ready for use just yet. |
In my case I don't even use the microwakeword one as it does not allow me to use my own wakeword. The default one has the same audio "problem" even though there is nothing set up for 16 kHz inside the config. The wakeword one is not ready for use maybe, yes. When I try to use it I get funny errors, like: adf_pipeline.i2s_audio: [source /data/packages/2ab3d759/esphome/onju-voice-microwakeword.yaml:153] Unknown value 'audioin', did you mean 'audio_in', 'audio_out'?. |
@jherby2k do you know if there's an issue we can track regarding this limitation on gnumpi/esphome_audio? I can't seem to find one at first glance. |
There isn't - i added a comment to the existing thread you had there, but it was ignored. I've opened one now: gnumpi/esphome_audio#42 |
Thank you! |
Did someone already try it out with the new branch? I am not at home right now and can't test before tomorrow. I am not sure if my audioin renaming to audio_in is really correct, too. (As I said, I use the config without microwakeword right now as my personalized sarcastic catgirl OpenAI is called Luna ^^) I really wonder if I can create a wake word file for microwakeword too... |
I won’t have time until tomorrow
|
I tried my luck and if there is no fault on my side it does NOT work. I used this config:
and this package https://github.com/dreimer1986/onju-voice-satellite/blob/main/esphome/onju-voice-microwakeword_1.yaml and here the log: P.S. |
Somehow I am too stupid to make things work. According to the report over there it should work now, but something I do is wrong I guess? |
i was testing with gnumpi. please test the following config: https://github.com/dwyschka/esphome-configs/blob/main/onjou-voice.yml Adjust it to your needs pls |
@dwyschka thx for the help. It was me setting input and output to 16 Bits... Now all is fine again. Music Assistant still is a mess and most of the time does not play as MP3 is not my main source and the only one it sems to like, but the normal media play feature does fine where I can select the right files easily. So... Seems to be way better :D |
Tried that, but nope. Even then the stream seems to be sort of flac alike: http://192.168.181.42:8098/flow/media_player.onju_voice_e27688_onju_voice_satellite_e27688/87bd58754b47491b800608f96ed48eef.flac?ts=1718660677 it works in browser though, so this is fine, but Onju stays silent. I think that my beta fetish is in the way again ^^ I try to use 2.1.0b6 here. |
Did anyone try to correct the non microwakeword version, too? I tried to (see my fork) but it seems like my copy together tinkering is not working at all. |
Which audio provider are you using? i tested with spotify and yt. but, i set the audio quality on the onju to "44100 mhz 24bit". maybe thats the fix for you? |
I tried changing the supported audio output in Music Assistant to 44100 @24bit, but still have a flac stream inside onju's logs. Do you see a mp3 there? |
Subscribe cause I'm interested to make also my 7 ghome mini around my home. |
Same |
I've tried upping the sample rate of the
|
Hm, maybe this explains why the microphone still is a weak spot on the Onju I tinker with? I have the same warnings of course and sometimes I have problems getting the wake word detected, especially if I am more than 2m away and/or speak slower. If so, maybe both settings should be added to configuration? Shall I open some issues or are you planning to do something in that direction already? Btw, just to be safe it's not my modding around making things worse: Only added:
This one builds fine even with the new Beta, but fails to link due to some Beta problems (esphome/issues#6036), so try it on stable version for now. (As I prepared for 2024.7 already you likely need the external components block from 1d83037 to make the older ESPHome happy again) |
OK, regarding the mic is weak problem. I tinkered with all kind of settings the last few days and it did not get that much better until... I talked to heinzpeda on ESPHome Discord. He recommended to use 48000 Hz for the pipeline and the result is day and night to before! I am back to accidental and funny activations and starting a sarcastic discussion about some unrelated phrase it caught up. That's how good the mic is now again. Media Player works flawless, too. I added some experimental routine that auto disables the wake word on playback and reactivates it on pause and stop. All in all I have what I want now except some stuff like a way to reply to a question the Assistant asks, and a custom micro wake word, but this is nothing we can do yet. |
hey @dreimer1986 i tried your fork and had some success. These two settings seem to be causing inaccuracies though: auto_gain: 31dBFS
volume_multiplier: 8 removing them massively improved whisper accuracy (i assume the mic was picking up lots of noise / clipping) yet it can still hear me from a ways off. also looks like you removed this under gain_log2: 3 I had to disable your auto-disable wake word stuff as it tends to leave me in a broken state after one or two questions, and left my mute switch backwards. I'm also having general instability, but i'll fiddle some more. |
Unfortunately despite playing back at 48khz/32bit w/resampling and decoding directly from FLAC via Music Assistant, it still sounds like crap. I have an unmodified Nest Mini right next to it and it sounds way, way better on the same tracks. Sounds like a $5 bluetooth speaker / AM radio. Not sure this is something that can be fixed in software. |
I've fiddled with a bunch of settings to see what works, what's redundant etc and this is what i've found. Of note, adf_pipeline:
- platform: i2s_audio
type: audio_out
id: adf_i2s_out
i2s_audio_id: i2s_shared
i2s_dout_pin: GPIO12
fixed_settings: true
use_apll: true # Quality - Seems to improve high frequency response to my ears at least
channel: left
sample_rate: 48000 # Quality improvement
#adf_alc: true # implicit - if set to false, volume can't be controlled
#alc_max: 0.5 # This just makes everything quieter. I've left it out
#bits_per_sample: 32bit # Implicit - 32-bit is now the default
- platform: i2s_audio
type: audio_in
id: adf_i2s_in
i2s_audio_id: i2s_shared
i2s_din_pin: GPIO17
fixed_settings: true
use_apll: true # Set here to match the output
channel: left
sample_rate: 48000 # Set here to match the output
pdm: false # required to compile
#bits_per_sample: 32bit # Implicit - 32-bit is now the default
microphone:
- platform: adf_pipeline
id: onju_microphone
keep_pipeline_alive: true
gain_log2: 3
pipeline:
- adf_i2s_in
- resampler # Quality - lets you use > 16 kHz sample rates
- self
media_player:
- platform: adf_pipeline
id: onju_out
name: None
internal: false
keep_pipeline_alive: true
codec: auto # Quality - Lets Music Assistant stream FLAC directly, instead of using the trancoding to MP3 option
pipeline:
- self
- resampler
- adf_i2s_out |
Where / how this would get integrated is above my skill level, but we might be able to improve ADC in software with something like this: https://github.com/G6EJD/ESP32-ADC-Accuracy-Improvement-function would ideally take some measurements off an actual Onju and tweak as necesary. |
I thought it would be a bad idea to use this issue for my stuff, but as this is meant to help in the original code, I think it's fine to continue. If not @tetele please just tell us. Sooo... Many things to reply to: Regarding auto_gain: 31dBFS and volume_multiplier: 8. Yes, the 8 was a bit too much and was a remnant when the microphone did a terrible job when the pipeline is set on 44100 Hz, but I see no reason to remove it altogether. I never used Piper and Whisper, so no idea if it dislikes them, but the official cloud solution does not. So I went down to volume_multiplier: 2 like the official S3 board configs did. Seems to be still fine over here. Regarding gain_log2: 3 removed. Yes, I did remove it as it did not do anything positive, but negative over here. Once more I took the official config as example and removed it. Never had a situation where it was needed for working. In my case it almost completely disables any text detection and after the wake word it's impossible to get any more information into it. Just added it again, had the same flaws and removed it again which made it working fine once more. So over here the setting is fatal to be added. The auto disable wake word stuff works fine here, so I keep it there for now and do some more testing. Regarding the audio quality. Tbh it was below my expectations, too. But as I never heard the original I just thought it's how it is. Tbh it's a small device and I thought that I cannot expect much more off it. So you say it should sound better huh? I will try your APLL precision timer now as this one is new to me and give feedback regarding it. All in all the quality maxed out is the reason for all the complex media player usage in this config, so we should do all things possible to make it do the best job possible. EDIT: OK one of them is quick to answer... use_apll is a GODSEND when things are more complex in the music playback. Just listened to Sternhagelvoll from In Extremo and was surprised how good the bagpipes sounds now. So yes, this one is a MUST HAVE! EDIT2: And another new one... If I remove the auto_gain and volume_multiplier settings then I can set gain_log2: 3 and it works fine. I have no clue which one would be to prefer here... yet. |
These are direct results from tetele#52 by jherby2k who did some heavy testing of all possible settings. Added APLL precision timer for a massive improvement of audio quality in media player mode. Complex music sounds more clear now, especially in higher frequencies. Lowered volume_multiplier to follow the official S3 board config and be closer to what the test results suggest. Left todo: gain_log2: 3 cannot be activated as suggested unless I remove auto_gain and volume_multiplier. Which variant is better? No idea yet...
Do go on, please. I'll collect this stuff when i finally get the chance to tackle the issue. My only question is how have you guys overcome the ring buffer overflow issue i've mentioned? |
Well... I did nothing at all. It just works fine here. I saw it a few times in my logs, but it seems like it has no negative effect here. Sometimes my Onju is a bit moody and does not want to work on 1st try, but most of the time it works just fine. Compared to the moodyness of my Echo devices it's still waay more reliable. I more often have a problem with micro wake word telling me that it detected my wake word, but the vad says no. Exact wording is difficult right now as I cannot make it output it right now... I will add it as soon as I managed to persuade it to do that again ^^ EDIT:
|
I only saw that ring buffer issue when not using the resampler for input and/or without the PSRAM. Not an issue for a while.
|
I still don't understand why Generally I think setting
Perhaps PCBWay used a different microphone for some reason? My quoted BOM mentions |
The dIfference is that I kept volume_multiplier and auto gain. Thus I bet it was WAAAY too loud. I went your way by now btw: dreimer1986@9747209 |
i'm very curious what you guys think about |
I already had my test. Not side by side as I just have one Onju Voice right now. (Enough PCBs left to even sell a few, but no Nest to take apart) But the results are clearly there and noticeable and thus worth the change I would say. I have my standard testing song with medieval instruments, bagpipes and singing. (In Extremo - Sternhagelvoll) It starts with a bagpipes solo with echo effect and then continues with electrical guitar and drums. The first 15 sec already show a big difference. The solo is not much difference, but as soon as the guitar starts to play the bagpipes became sort of "muffled". APLL reduces this IMO and the bagpipes are clearly better to differentiate than before. I would say the quality clearly is improving with APLL. Still not to a audiophile level, but this thing is a small handful of sound. Of course it's nothing to compare to my Teufel Ultima 40 hell of a sound system. |
Hey guys, I don't know ESPHome that well (I'm just learning it) but I noticed a few changes in the latest version that I think may make it easier to improve the sound quality. esphome/esphome#7306 |
I did some tests on my devices and discovered something interesting.. in HA i enabled debug mode for my all 5 devices..
This is very strange because I thought they would be exactly the same PCB..Do you have similar ones? |
I've had issues on one of my PCBs with one Mic being defective (or just badly soldered). Luckily I was able to change the mic channel which then used the second mic on the PCB. At least that was my theory for that PCB... |
Another discovery, turning off the devices that were beeping for 5 minutes caused the squeaking to disappear. Now I was able to use the version from @jherby2k again on each of the devices. However, after checking, part of devices have better sound quality than others. |
Seems there is something going on with your pcb's. I've just tested 2 of the 5 and they work great. |
You're in Norway? |
Not sure if this makes sense, but in my experience Onju can sound much better if we could adjust the equalizer. To me, the medium frequencies stand out too much, while hi and low are lacking. In Arduino-based version it's super trivial to add support for equalizer, so I did some quick patching in my local version of the component and I think it sounded much better with +6/-12/+6 gain adjustments. The ADF Pipeline version sounds... terrible to me "out of the box" but it gets notably better with Unfortunately, I couldn't make it work [at least yet]. But that's exactly why I'm writing here as I'm hoping someone who's more knowledgeable in this domain can help by pointing out what's wrong. I don't think I structured the code "in the right way" (eg the equalizer component should not be under i2s) but as I'm not familiar with the framework, was just trying to make things work with minimum amount of changes. The way I tried it in my config was:
Then the intent was to add numeric inputs to adjust each band to real-time tuning, and extend config to provide defaults, but I left it out of the above sample for brevity. As to how it doesn't work - the sound appears distorted with EQ enabled, hear the recordings: https://soundcloud.com/user-430926493-669422671/sets/adf-eq I also tried adding Hoping someone can help to figure this out so we can get our Onjus sounding even better while running on ADF with MWW. |
I'm also very interested in having an equalizer for the ADF pipeline. In my opinion, the Onju configuration lacks sufficient bass and perhaps has too many mids. I assume the original Google board simply applies an equalizer to optimize the sound for this relatively small speaker. I recently swapped out the boards and tested the original board and speaker beforehand. The original setup sounded significantly better than the Onju board and configuration. This suggests that improved audio quality may be achievable. |
Checklist
Is your feature request related to a problem? Please describe.
Not really a problem, just muffled audio because of the limit to 16000 Hz. I am very sure I read somewhere about a few reasons for using this limit. I just don't find it anymore.
Describe the solution you'd like
44100 Hz sounds like a nice solution, especially if the media player features are used more often.
Describe alternatives you've considered
Of course I just can use another device as media player, but I want less than more and Onju voice should make my Echo obsolete which is not much left todo for.
Additional context
The text was updated successfully, but these errors were encountered: