The aim of the R&D project (ID: 2019-1.1.1-PIACI-KFI-2019-00126) was to create an artificial intelligence-based system that supports the targeted mapping and processing of economic and business content generated on the internet, as well as the production of editing-ready raw materials in Hungarian. The core of the development concept was to enable analysts and journalists—who perform high-value-added work—to utilize pre-filtered, structured, and content-prepared source materials instead of engaging in routine information gathering and basic translation. This allows for shortened editorial turnaround times and a significant expansion in the volume of information that can be processed.
As a result of the project, a cloud-based, modular data collection and processing infrastructure was established, where data collection, aggregation, and the execution of processing and analysis steps are distinctly separated. The central element of the architecture is a structured data storage layer and its associated suite of data processing applications. These ensure that information arriving from various sources enters the subsequent AI-based analysis and content preparation processes in a standardized format. The system was designed to interface directly with the internal administrative and editorial dashboard, allowing extracted information and generated content proposals to be effortlessly incorporated into the daily workflow.
One of the main achievements in professional content preparation was the classification, grouping, and redundancy reduction of news. The project successfully implemented the thematic and organizational categorization of content, alongside grouping solutions based on vectorization and clustering, as well as entity-based arrangement. In parallel, information extraction methods were integrated. Relying on keyword-based and deep learning approaches, these are capable of identifying key information, relevant phrases, and content focal points—effectively generating the essential summaries required for swift editorial decision-making.
A prominent, innovative element of the project was text generation and multilingual content preparation. Within this framework, several generation approaches based on modern neural language models were tested and compiled into an application-ready prototype, supplemented with machine translation components. The goal of this development was not unsupervised, automatic publishing, but rather the production of article outlines and raw materials that work in tandem with editorial quality assurance, thereby strengthening journalistic value-added. The system prepares, organizes, and proposes source information as text, while the final content decision and responsibility remain firmly within the editorial process.
Among the tasks associated with the project's final milestone, the processing of audiovisual content also emerged as a key direction. Solutions supporting the transcription of audio and video streams, and the subsequent economic-themed processing of the resulting texts, were examined and implemented at a prototype level. Consequently, the system became capable of generating prepared content summaries and editing-ready raw materials based on information derived not exclusively from written news sources, but also from spoken content. As part of the development package, automated notification and monitoring functions were realized. Linked to keywords, topics, or market signals, these can quickly flag relevant content, further reducing response times in economic news reporting.
Overall, the tangible result of the project is an integrated, AI-supported content preparation platform that easily fits into the editorial process. This platform covers several highly resource-intensive sub-processes of economic content generation, from source discovery, information cleansing, and grouping to summarization and the production of text proposals. Following the project's closure, the potential for further developing the solution is clearly definable—particularly in light of the rapid advancement of new language models, the incorporation of editorial feedback, as well as real-time speech recognition and the integration of additional sources. In the longer term, this establishes a strong foundation for regional and multilingual utilization.







