قاموس رقمي هدفه المساهمة في لم شتات المصطلحات العربية في المجال التقني

Dataset

Datasets

مجموعات البيانات

LINQ to DataSet

LINQ to DataSet

Dataset Designer

مصمم مجموعة البيانات

dataset parameter

محددة مجموعة البيانات

Common Voice Dataset

مجموعة بيانات «الصوت للعموم»

Other Voice Datasets

مجموعات البيانات الصوتية الأخرى

Other voice datasets…

مجموعات البيانات الأخرى…

Download Dataset Bundle

نزّل حزمة مجموعات البيانات

Return to Common Voice Datasets

عُد إلى قواعد بيانات «الصوت للعموم»

What’s inside the Common Voice dataset?

ما الموجود في مجموعة بيانات «الصوت للعموم»؟

Help us build a high quality, publicly open dataset

ساعِدنا لبناء قاعدة بيانات عالية الجودة ومفتوحة للعموم

Common Voice data plus all other voice datasets above.

بيانات «الصوت للعموم» مع كل مجموعات البيانات أعلاه.

What level of audio quality is required for a voice clip to be used in the dataset?

ما المستوى المطلوب لجودة الصوت ليُستخدم المقطع الصوتي في قاعدة البيانات؟

Use of getUserData() or setUserData() is deprecated. Use WeakMap or element.dataset instead.

استخدام getUserData() أو setUserData() صار مهجورًا. استخدم WeakMap أو element.dataset بدلهما.

Do you have ideas on how we can make the Common Voice dataset better? Let us know on Discourse

ألديك أية أفكار نيّرة ترتقي بمجموعة بيانات «الصوت للعموم»؟ تفضل في دِسكورس وأخبرنا به

To make it into the Common Voice dataset, a voice clip must be validated by two separate users.

ليدخل المقطع الصوتي مجموعة بيانات «الصوت للعموم»، على مستخدمين اثنين التحقق منه.

What does it mean that I can’t “determine the identity” of speakers in the Common Voice dataset?

ما معنى أنني لا أقدر على تحديد هويّة المتحدثين في قاعدة بيانات «الصوت للعموم»؟

You agree to not attempt to determine the identity of speakers in the Common Voice dataset

تُوافق على عدم التجربة ومحاولة تحديد هويّة أي مساهم في مجموعة بيانات «الصوت للعموم»

Want updates when we release a new version of the Common Voice dataset? Subscribe to our newsletter.

هل تريد استلام التحديثات عندما نُصدر نسخة جديدة من بيانات «الصوت للعموم» الصوتية؟ اشترك في نشرتنا الإخبارية.

Don’t see your language reflected in the Dataset? To request a language head over to our Languages page.

ألا ترى لغتك ضِمن مجموعة البيانات؟ توجه إلى صفحة ”اللغات“ في الموقع واطلبها.

Recording voice clips is an integral part of building our open dataset; some would say it's the fun part too.

يُعتبر تسجيل المقاطع الصوتية جزءًا لا يتجزّأ من صناعة مجموعة بيانات مفتوحة نقدر على استخدامها. قد يقول البعض أنها المرحلة الممتعة في خضمّ كل هذا.

You can help build a diverse, open-source dataset by creating a Common Voice profile and contributing your voice.

يمكنك تقديم يد العون لبناء مجموعة بيانات متنوّعة ومفتوحة المصدر بإنشاء حساب على «الصوت للعموم» والمساهمة بصوتك.

We’re building an open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.

نصنعُ مجموعة بيانات بأصوات مفتوحة المصدر ومتعددة اللغات يُمكن لأي شخص استخدامها لتدريب التطبيقات التي تعمل بالتعرف الصوتي.

Would you like to request your voice recordings be deleted too, or do you prefer to keep them in the Common Voice dataset?

أتريد أيضا حذف تسجيلاتك الصوتية كذلك، أم أنك تفضّل إبقائها في مجموعة بيانات «الصوت للعموم»؟

The Common Voice Dataset contains hundreds of thousands of voice samples that help developers build voice recognition tools.

تحتوي مجموعة بيانات «الصوت للعموم» على آلاف مؤلّفة من عيّنات الأصوات، فتساعد المطوّرين على بناء الأدوات اللازمة للتعرّف على الصوت.

We are building an open and publicly available dataset of voices that everyone can use to train speech-enabled applications.

نعمل على بناء مجموعات بيانات مفتوحة ومتاحة للعموم يمكن استخدامها في تدريب التطبيقات التي تعمل بالتعرف الصوتي.

The dataset is available now on our <downloadLink>download page</downloadLink> under the <licenseLink>CC-0</licenseLink> license.

مجموعات البيانات متوفرة في <downloadLink>صفحة التنزيل</downloadLink> برخصة <licenseLink>CC-0</licenseLink>.

Why an email? We may need to contact you in the future about changes to the dataset, an email provides us a point of contact.

لمَ تريدون بريدي؟ قد يحدث ونتواصل معك مستقبلا حول أية تغييرات على مجموعة البيانات. البريد الإلكتروني الذي تقدّمه هو وصلة الربط بيننا وبينك.

The process by which a contributor’s profile information is obscured from their donated voice clips when packaged for download as a part of the dataset.

العملية التي يجري فيها إخفاء معلومات حساب المساهم في مقاطع الفيديو التي ساهم بها، وذلك عند تحزيمها لتنزيلها كجزء من مجموعة البيانات.

We’re crowdsourcing an open-source dataset of voices. Donate your voice, validate the accuracy of other people’s clips, make the dataset better for everyone.

نستعينُ بالغير لصنع مجموعة بيانات صوتية مفتوحة المصدر. تبرّع بصوتك وتحقّق من دقّة مقاطع الغير وستجعل مجموعة البيانات هذه أفضل بكل المقاييس ولكل الناس.

The Clip Graveyard consists of voice clips that didn't make it into the Common Voice dataset. Just like the dataset, the Clip Graveyard is available for download.

تحتوي ”مقبرة المقاطع“ تلك المقاطع التي لم تدخل مجموعة بيانات «الصوت للعموم». كما الحال مع مجموعة البيانات، يمكن تنزيل ”مقبرة المقاطع“ أيضا.

Your anonymous voice recordings will remain in the Common Voice dataset. Once you delete your profile you will no longer be able to submit a request to remove your recordings from the dataset

ستبقى تسجيلاتك الصوتية في مجموعة بيانات «الصوت للعموم» بطريقة مجهّلة. لن تقدر على إرسال طلب لحذف تسجيلاتك من مجموعة البيانات ما إن تحذف ملفك الشخصي.

Common Voice is part of Mozilla's initiative to help teach machines how real people speak. In addition to the Common Voice dataset, we’re also building an open source speech recognition engine called Deep Speech.

مشروع «الصوت للعموم» هو جزء من مبادرة من شركة Mozilla تهدف إلى تعليم الآلة الكيفية التي ينطق بها بني البشر. وإلى جانب مجموعة بيانات «الصوت للعموم» فنحن نبني أيضًا محرّكًا مفتوح المصدر للتعرّف على النطق أسميناه Deep Speech (النطق العميق).

We will review your request to remove your voice recordings from the dataset. If your request is approved, we will contact those who have downloaded the dataset and request they remove your voice recordings as well.

سنراجع طلبك بإزالة تسجيلاتك الصوتية من مجموعة البيانات. إن جرت الموافقة عليه فسنراسل مَن نزّل مجموعة البيانات ونطلب منهم حذف تسجيلاتك الصوتية من نسخهم أيضا.

The Common Voice dataset is available for download under the <licenseLink>CC0</licenseLink> license on <datasetLink>our Datasets page</datasetLink>. You can also download several other publicly available datasets from the same page.

مجموعة بيانات «الصوت للعموم» متاحٌ تنزيلها برخصة <licenseLink>CC0</licenseLink> وذلك من <datasetLink>صفحة مجموعات البيانات</datasetLink>. يمكنك أيضا تنزيل مجموعات بيانات أخرى منشورة للعموم في نفس الصفحة.

Common Voice is a collaborative project, and we're depending on our community of partners and contributors to build the largest open-source dataset of voices ever. We would like to thank the following people and organizations for their help with the project:

مشروع «الصوت للعموم» هو مشروع تعاوني نعتمد فيه على مجتمعنا من الشركاء والمساهمين لصناعة أكبر مجموعة بيانات صوتية مفتوحة المصدر عرفتها البشرية. نودّ تقديم خالص شكرنا لمن ساعدنا في هذا المشروع أشخاصًا كانوا أو منظّمات:

The goal of the Common Voice dataset is to enable anyone in the world to build speech recognition, speaker recognition, or any other type of application that requires voice data. A voice assistant is just one of many types of applications you could use the dataset to build.

الهدف من قاعدة بيانات «الصوت للعموم» هو إتاحة لجميع من في العالم تطوير أي تطبيق يعمل بالتعرف الصوتي، أو التعرف على هوية صاحب الصوت أو أي تطبيق آخر يحتاج بيانات صوتية. المساعد الصوتي هو أحد تلك التطبيقات العديدة والتي يمكنك استخدام مجموعة البيانات هذه لتطويرها.

The Common Voice dataset complements Mozilla’s open source voice recognition engine Deep Speech, which you can use to build speech recognition applications. Read our <githubLink>Github overview</githubLink> or join the <discourseLink>DeepSpeech Discourse</discourseLink> to learn how to get started.

تُكمّل مجموعة بيانات «الصوت للعموم» محرّكَ Mozilla المفتوح المصدر للتعرّف الصوتي Deep Speech ويمكنك استعمالها لتصنع منها تطبيقات تتعرّف على النطق. اقرأ <githubLink>النظرة العامة على غِت‌هَب</githubLink> أو انضمّ معنا إلى <discourseLink>دِسكورس DeepSpeech</discourseLink> لتعرف كيف تبدأ مشوار الألف ميل.

All voice clips in the dataset are scrubbed of personally identifying information. When a contributor provides demographic data via their profile, that information is de-identified from their voice clips before being bundled for download in the dataset and is never made public on their profile page.

تُنظّف كل المقاطع الصوتي في مجموعة البيانات من أية معلومات تميّز مَن أرسلها. عندما يُقدّم المساهم البيانات الديموغرافية في حسابه، تُزال تلك المعلومات من مقاطع الصوت التي أرسلها وذلك قبل تحزيمها لتنزيلها في مجموعة البيانات. لا ننشر هذه المعلومات في صفحة حسابه للعموم أبدًا.

We believe that large and publicly available voice datasets foster innovation and healthy commercial competition in machine-learning based speech technology. This is a global effort and we invite everyone to participate. Our aim is to help speech technology be more inclusive, reflecting the diversity of voices from around the world.

نؤمن بأن مجموعات البيانات الصوتية الكبيرة والمتاحة للعموم تُشجّع على الابتكار والمنافسة التجارية الصحيحة في تقنيات تعليم الآلات التعرفَ الصوتي. إن هذا المجهود عالمي وندعو الجميع للمشاركة. هدفنا هو مساعدة تقنيات التعرف الصوتي لتكون شاملة أكثر وتعكس اختلاف الأصوات وتنوعها حول العالم.

Optionally submitted demographic data (e.g. age, sex, language, and accent) will never be made public on your profile, and will not be linked to your account in the dataset. Individual audio clips will be associated with demographic data for the purpose of more accurate analysis - for example, a researcher might want to target a training model to a specific demographic segment.

لن يُتاح ما تُرسله اختياريا من بيانات ديموغرافية (مثل العمر والجنس و اللغة واللكنة) في ملفك الشخصي، ولن تُربط بحسابك في مجموعة البيانات. ستُربط المقاطع الصوتية المنفردة بالبيانات الديموغرافية لغرض زيادة دقة التحليل - كأن يريد باحث أن يستهدف نموذجا تدريبيا لفئة ديموغرافية معينة.

Optionally submitted demographic data (e.g. age, gender, language, and accent) will never be made public on your profile, and will not be linked to your account in the dataset. Individual audio clips will be associated with demographic data for the purpose of more accurate analysis - for example, a researcher might want to target a training model to a specific demographic segment.

لن يُتاح ما تُرسله اختياريا من بيانات ديموغرافية (مثل العمر والجنس و اللغة واللكنة) في ملفك الشخصي، ولن تُربط بحسابك في مجموعة البيانات. ستُربط المقاطع الصوتية المنفردة بالبيانات الديموغرافية لغرض زيادة دقة التحليل - كأن يريد باحث أن يستهدف نموذجا تدريبيا لفئة ديموغرافية معينة.

We want the Common Voice dataset to reflect the audio quality a speech-to-text engine will hear in the wild, so we’re looking for variety. In addition to a diverse community of speakers, a dataset with varying audio quality will teach the speech-to-text engine to handle various real-world situations, from background talking to car noise. As long as your voice clip is intelligible, it should be good enough for the dataset.

نريد من قاعدة بيانات «الصوت للعموم» أن تعكس جودة الصوت التي سيستلمها محرّك التعرف على الكلام والتي يمكن أن تكون بأي حالة، لذلك فنحن نبحث عن التنوّع في البيانات. عدا كون مجتمع المتحدثين شديد التنوّع، فقاعدة البيانات التي تحتوي مختلف جودات الصوت ستعلّم محركات التعرف على الكلام طريقة التعامل مع مختلف الحالات في الواقع اليومي، بدءا بالأصوات البعيدة ووصولا إلى إزعاج السيارات. طالما أن المقطع الصوتي الذي قدّمته واضح ومفهوم، فسيكون ذلك كافيا لتضمينه في قاعدة البيانات.

We want the Common Voice dataset to reflect the audio quality a speech-to-text engine will hear in the wild, so we’re looking for variety. In addition to a diverse community of speakers, a dataset with varying audio quality will teach the speech-to-text engine to handle various real-world situations, from background talking to car noise. As long as your voice clip is intelligible, it should be good enough for the dataset.

نريد من قاعدة بيانات «الصوت للعموم» أن تعكس جودة الصوت التي سيستلمها محرّك التعرف على الكلام والتي يمكن أن تكون بأي حالة، لذلك فنحن نبحث عن التنوّع في البيانات. عدا كون مجتمع المتحدثين شديد التنوّع، فقاعدة البيانات التي تحتوي مختلف جودات الصوت ستعلّم محركات التعرف على الكلام طريقة التعامل مع مختلف الحالات في الواقع اليومي، بدءا بالأصوات البعيدة ووصولا إلى إزعاج السيارات. طالما أن المقطع الصوتي الذي قدّمته واضح ومفهوم، فسيكون ذلك كافيا لتضمينه في قاعدة البيانات.

The Common Voice dataset complements Mozilla’s open source voice recognition engine Deep Speech. The first version of Deep Speech was released in November 2017 and has continued to evolve ever since. Together with the Common Voice dataset, we believe this open source voice recognition technology should be available to everybody. It’s our hope these technologies will enable developers to build a wave of innovative products and services.

تُكمّل مجموعة بيانات «الصوت للعموم» محرّكَ Mozilla المفتوح المصدر للتعرّف الصوتي Deep Speech. صدرت الإصدارة الأولى من هذا المحرّك في نوفمبر 2017 وتواصل تطويره وقوّته منذئذ. نرى نحن أنّه بدمج هذا المحرّك مع مجموعة بيانات «الصوت للعموم» ستصير تقنيّة التعرّف الصوتي مفتوحة المصدر مُتاحة لجميع الناس. كلّنا أمل بأنّ تسمح هذه التقنيّات وتقدّم للمطوّرين حلول لصناعة منتجات وخدمات مبتكرة لها بداية وليس لها نهاية.

We believe that large, publicly available voice datasets will foster innovation and healthy commercial competition in machine-learning based speech technology. Common Voice’s multi-language dataset is already the largest publicly available voice dataset of its kind, but it’s not the only one. Look to this page as a reference hub for other open source voice datasets and, as Common Voice continues to grow, a home for our release updates.

نؤمن بأن مجموعات البيانات الصوتية الكبيرة والمُتاحة للعموم ستعزّز الابتكار والمنافسة التجارية السليمة في تقنية الكلام بمعونة تعلّم الآلات. مجموعة بيانات «الصوت للعموم» متعدّدة اللغات هي فعلًا أكبر مجموعة بيانات صوتية مُتاحة للعموم من نوعها، إلّا أنها ليست الوحيدة. لتكن هذه الصفحة دليلك لمجموعات البيانات الصوتية الأخرى مفتوحة المصدر، وأيضًا محطة لأحدث إصدارات «الصوت للعموم» كلّما توسّعت أكثر.

Each entry in the dataset consists of a unique MP3 and corresponding text file. Many of the { $total } recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help train the accuracy of speech recognition engines. The dataset currently consists of { $valid } validated hours in { $languages } languages, but we’re always adding more voices and languages. Take a look at our <languagesLink>Languages page</languagesLink> to request a language or start contributing.

لكلّ مدخلة في مجموعة البيانات ملفَّ MP3 فريد بها وملف نصي معه. كما أنّ هناك عدد ضخم من الساعات الـ { $total } المسجّلة بيانات ديموغرافية مثل العمر والجنس واللكنة فتُدرّب وتحسّن من دقّة محرّكات التعرّف على النطق. تحتوي مجموعة البيانات الآن على { $valid } من الساعات المدقّقة منتشرة على { $languages } من اللغات، ومع ذلك فنحن نُضيف أصوات ولغات أكثر على الدوام. طالِع <languagesLink>صفحة اللغات</languagesLink> في المشروع لتطلب لغةً أو تبدأ المساهمة بها.

To make the Common Voice dataset as useful as possible we have decided to only allow source text that is available under a Creative Commons (CC0) license. Using the CC0 standard means its more difficult to find and collect source text, but allows anyone to use the resulting voice data without usage restrictions or authorization from Mozilla. Ultimately, we want to make the multi-language dataset as useful as possible to everyone, including researchers, universities, startups, governments, social purpose organizations, and hobbyists.

لتتحقق أقصى استفادة من مجموعة بيانات «الصوت للعموم»، قرّرنا بألا نسمح بأية نصوص مصدرية غير متاحة برخصة المشاع الإبداعي (CC0). استعمال معيار CC0 (أي ”ملك عام“) يصعّب العثور على النصوص المصدرية وجمعها، لكنه يسهّل على الجميع أيما كان استعمال البيانات الصوتية الناتجة دون أية تقييدات أو تصاريح من موزيلا نفسها. ما نريده أساسا هو أن تكون مجموعة البيانات متعددة اللغات مفيدة للجميع كافة ما أمكن، أكانوا باحثين أو جامعات أو شركات ناشئة أو حكومات أو منظمات تتعلق بالمجتمعات أو هواة.

The Common Voice dataset is an open and publicly available resource that can be used to train a wide variety of speech-enabled applications. To protect the security of our contributors, we ask everyone who downloads the Common Voice dataset to respect contributors’ privacy. All voice clips in the dataset are scrubbed of personally identifying information. When you download the dataset, you agree to not attempt to determine the identity of any contributor. That means you cannot try to link information in the dataset to a contributor’s personal information. You may, however, use the dataset to train speech recognition, speaker recognition, or other applications, by, for instance, linking information in the dataset to other information already in the dataset.

مجموعة بيانات «الصوت للعموم» مورد مفتوح ومتاح للعموم ويمكن استخدامه لتدريب مجموعة واسعة من التطبيقات المختلفة والتي تعمل بالتعرف الصوتي. لحماية أمن وسرّية مساهمينا نطلب من جميع من ينزّل مجموعة بيانات «الصوت للعموم» احترامَ خصوصية المساهمين. تُنظّف كل المقاطع الصوتي في مجموعة البيانات من أية معلومات تميّز مَن أرسلها. عندما تنزّل قاعدة البيانات فأنت موافق على عدم التجربة ومحاولة تحديد هويّة أي مساهم. يعني هذا بأنك لا تستطيع ربط المعلومات في مجموعة البيانات بمعلومات المساهم الشخصية. مع ذلك، يمكنك استخدام مجموعة البيانات لتدريب تقنيات التعرف الصوتي، والتعرف على هوية صاحب الصوت أو أية تطبيقات أخرى لأعمال مثل ربط المعلومات في مجموعة البيانات بمعلومات أخرى موجودة في مجموعة البيانات بالفعل.