[ad_1]
I’m struggling a bit with a Speech app I’m engaged on. I adopted the dev instance for making a Speech Recognizer on Apple’s developer web site right here, and my code is under. It’s working properly, and I get steady recognition as anticipated.
Nevertheless, my app concept requires me to get seize every quantity in a collection of numbers as they’re spoken. With the code under, I can efficiently converse an extended collection of numbers (e.g. “2, 5, 3, 7, 10, 6…”) and as soon as I cease it would finally return an SFTranscription array with transcriptions holding segments for every quantity I spoke. The rationale I say finally, is as a result of the speech recognizer is consistently attempting to find out an intelligible response in human language or codecs (on this case, telephone numbers, bigger multi digit numbers, and so forth.), which is what it ought to do for dictation and human language. However I want to get every phrase (quantity) spoken as it’s mentioned earlier than it tries to make sense of it. Is there a technique to seize the final phrase earlier than the recognizer makes an attempt to narrate it to all different phrases prior?
import UIKit
import Speech
public class ViewController: UIViewController, SFSpeechRecognizerDelegate {
non-public let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
non-public var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
non-public var recognitionTask: SFSpeechRecognitionTask?
non-public let audioEngine = AVAudioEngine()
@IBOutlet var textView: UITextView!
@IBOutlet var recordButton: UIButton!
non-public var isListening = false
public override func viewDidLoad() {
tremendous.viewDidLoad()
recordButton.isEnabled = false
textView.isEditable = false
}
override public func viewDidAppear(_ animated: Bool) {
tremendous.viewDidAppear(animated)
speechRecognizer.delegate = self
SFSpeechRecognizer.requestAuthorization { authStatus in
OperationQueue.essential.addOperation {
change authStatus {
case .licensed:
self.recordButton.isEnabled = true
case .denied:
self.recordButton.isEnabled = false
self.recordButton.setTitle("Person denied entry to speech recognition", for: .disabled)
case .restricted:
self.recordButton.isEnabled = false
self.recordButton.setTitle("Speech recognition restricted on this machine", for: .disabled)
case .notDetermined:
self.recordButton.isEnabled = false
self.recordButton.setTitle("Speech recognition not but licensed", for: .disabled)
default:
self.recordButton.isEnabled = false
}
}
}
}
non-public func startRecording() throws {
recognitionTask?.cancel()
self.recognitionTask = nil
let audioSession = AVAudioSession.sharedInstance()
strive audioSession.setCategory(.report, mode: .measurement, choices: .duckOthers)
strive audioSession.setActive(true, choices: .notifyOthersOnDeactivation)
let inputNode = audioEngine.inputNode
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }
recognitionRequest.shouldReportPartialResults = true
if #accessible(iOS 13, *) {
recognitionRequest.requiresOnDeviceRecognition = false
}
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { outcome, error in
var isFinal = false
if let outcome = outcome {
if self.isListening {
outcome.transcriptions.forEach { transcription in // Seize SFTranscription from outcome
transcription.segments.forEach { section in
print( section.substring )
}
}
print("---")
}
isFinal = outcome.isFinal
}
if error != nil || isFinal {
self.audioEngine.cease()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.recordButton.isEnabled = true
self.recordButton.setTitle("Begin Recording", for: [])
}
}
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
self.recognitionRequest?.append(buffer)
}
audioEngine.put together()
strive audioEngine.begin()
textView.textual content = "(Go forward, I am listening)"
}
public func speechRecognizer(_ speechRecognizer: SFSpeechRecognizer, availabilityDidChange accessible: Bool) {
if accessible {
recordButton.isEnabled = true
recordButton.setTitle("Begin Recording", for: [])
} else {
recordButton.isEnabled = false
recordButton.setTitle("Recognition Not Accessible", for: .disabled)
}
}
@IBAction func recordButtonTapped() {
if audioEngine.isRunning {
audioEngine.cease()
recognitionRequest?.endAudio()
recordButton.isEnabled = false
recordButton.setTitle("Stopping", for: .disabled)
self.isListening = false
} else {
do {
strive startRecording()
recordButton.setTitle("Cease Recording", for: [])
self.isListening = true
} catch {
recordButton.setTitle("Recording Not Accessible", for: [])
}
}
}
}
Instance output for saying “4, 7, 5, 5, 7, 3” – every block after “—” represents all segments in a single returned transcript.
For
---
For
seven
---
47
---
475
---
4
7
5
---
4755
---
47554
---
475543
---
475543
---
4
7
5
5
4
3
I can deal with the spelled out (e.g. “4”) simply with a perform, however the lengthy concatenated string numbers are whats fouling me up. I wish to seize them earlier than they get concatenated, and never have to attend till the very finish when it will definitely separates them into particular person segments.
Thanks for any assist!
[ad_2]