Monday, August 15, 2022
HomeiOS DevelopmentiOS (Swift) Speech Transcription - capturing final phrase/quantity throughout steady transcription

iOS (Swift) Speech Transcription – capturing final phrase/quantity throughout steady transcription

[ad_1]

I’m struggling a bit with a Speech app I’m engaged on. I adopted the dev instance for making a Speech Recognizer on Apple’s developer web site right here, and my code is under. It’s working properly, and I get steady recognition as anticipated.

Nevertheless, my app concept requires me to get seize every quantity in a collection of numbers as they’re spoken. With the code under, I can efficiently converse an extended collection of numbers (e.g. “2, 5, 3, 7, 10, 6…”) and as soon as I cease it would finally return an SFTranscription array with transcriptions holding segments for every quantity I spoke. The rationale I say finally, is as a result of the speech recognizer is consistently attempting to find out an intelligible response in human language or codecs (on this case, telephone numbers, bigger multi digit numbers, and so forth.), which is what it ought to do for dictation and human language. However I want to get every phrase (quantity) spoken as it’s mentioned earlier than it tries to make sense of it. Is there a technique to seize the final phrase earlier than the recognizer makes an attempt to narrate it to all different phrases prior?

import UIKit
import Speech

public class ViewController: UIViewController, SFSpeechRecognizerDelegate {
      
    non-public let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
    non-public var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    non-public var recognitionTask: SFSpeechRecognitionTask?
    non-public let audioEngine = AVAudioEngine()
    @IBOutlet var textView: UITextView!
    @IBOutlet var recordButton: UIButton!
    
    non-public var isListening = false
    
    public override func viewDidLoad() {
        tremendous.viewDidLoad()
            recordButton.isEnabled = false
        textView.isEditable = false
    }
    
    override public func viewDidAppear(_ animated: Bool) {
        tremendous.viewDidAppear(animated)
        speechRecognizer.delegate = self
        SFSpeechRecognizer.requestAuthorization { authStatus in
            OperationQueue.essential.addOperation {
                change authStatus {
                case .licensed:
                    self.recordButton.isEnabled = true
                case .denied:
                    self.recordButton.isEnabled = false
                    self.recordButton.setTitle("Person denied entry to speech recognition", for: .disabled)
                case .restricted:
                    self.recordButton.isEnabled = false
                    self.recordButton.setTitle("Speech recognition restricted on this machine", for: .disabled)
                case .notDetermined:
                    self.recordButton.isEnabled = false
                    self.recordButton.setTitle("Speech recognition not but licensed", for: .disabled)
                default:
                    self.recordButton.isEnabled = false
                }
            }
        }
    }
    
    non-public func startRecording() throws {
        recognitionTask?.cancel()
        self.recognitionTask = nil
        let audioSession = AVAudioSession.sharedInstance()
        strive audioSession.setCategory(.report, mode: .measurement, choices: .duckOthers)
        strive audioSession.setActive(true, choices: .notifyOthersOnDeactivation)
        let inputNode = audioEngine.inputNode
        recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
        guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }
        recognitionRequest.shouldReportPartialResults = true
        if #accessible(iOS 13, *) {
            recognitionRequest.requiresOnDeviceRecognition = false
        }
        recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { outcome, error in
            var isFinal = false
            if let outcome = outcome {
                
                if self.isListening {
                    outcome.transcriptions.forEach { transcription in  // Seize SFTranscription from outcome
                        transcription.segments.forEach { section in
                            print( section.substring )
                        }
                    }
                    print("---")
                }

                isFinal = outcome.isFinal
            }
            if error != nil || isFinal {
                self.audioEngine.cease()
                inputNode.removeTap(onBus: 0)
                self.recognitionRequest = nil
                self.recognitionTask = nil
                self.recordButton.isEnabled = true
                self.recordButton.setTitle("Begin Recording", for: [])
            }
        }
        let recordingFormat = inputNode.outputFormat(forBus: 0)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
            self.recognitionRequest?.append(buffer)
        }
        audioEngine.put together()
        strive audioEngine.begin()
        textView.textual content = "(Go forward, I am listening)"
    }
    public func speechRecognizer(_ speechRecognizer: SFSpeechRecognizer, availabilityDidChange accessible: Bool) {
        if accessible {
            recordButton.isEnabled = true
            recordButton.setTitle("Begin Recording", for: [])
        } else {
            recordButton.isEnabled = false
            recordButton.setTitle("Recognition Not Accessible", for: .disabled)
        }
    }
    @IBAction func recordButtonTapped() {
        if audioEngine.isRunning {
            audioEngine.cease()
            recognitionRequest?.endAudio()
            recordButton.isEnabled = false
            recordButton.setTitle("Stopping", for: .disabled)
            self.isListening = false
        } else {
            do {
                strive startRecording()
                recordButton.setTitle("Cease Recording", for: [])
                self.isListening = true
            } catch {
                recordButton.setTitle("Recording Not Accessible", for: [])
            }
        }
    }
}

Instance output for saying “4, 7, 5, 5, 7, 3” – every block after “—” represents all segments in a single returned transcript.

For
---
For
seven
---
47
---
475
---
4
7
5
---
4755
---
47554
---
475543
---
475543
---
4
7
5
5
4
3

I can deal with the spelled out (e.g. “4”) simply with a perform, however the lengthy concatenated string numbers are whats fouling me up. I wish to seize them earlier than they get concatenated, and never have to attend till the very finish when it will definitely separates them into particular person segments.

Thanks for any assist!

[ad_2]

RELATED ARTICLES

Most Popular

Recent Comments