Enhancing Invoice Number Retrieval in Influencer Agency Transaction Records Using TF-IDF and SVC Model

 

Frisca Fitria

Universitas President University, Indonesia

 

*Email: [email protected]

*Correspondence: Frisca Fitria

 

 

DOI: 10.59141/comserva.v4i1.1339

 

 

 

 

 

ABSTRAK

This research explore the need for efficient invoice management in influencer marketing. The proposed solution using TF-IDF and SVC Model to effectively locate and handle crucial transaction information, including dates, brand names, invoice numbers, amounts, frequency and category. By using this approach to significantly improve the accuracy of identifying invoice numbers in influencer agency records. For the result this method are evaluated using precision accuracy and F1 Score metrics, to emphasize its effectiveness in current challenges of manual extraction. Overall, this paper offers practical solution to enhanced influencer agency transactions.

 

Kata kunci: TF-IDF, SVC Model, Invoice Number Retrieval, Influencer Agency, Transaction Records, Precision, Accuracy.

 

 

INTRODUCTION

Background

In modern advertising, influencer marketing has become essential, signalling a shift in consumer-brand dynamics (Yeo et al., 2023). The rising prevalence of influencer collaborations underscores the need for efficient transaction record management, vital for financial accountability and operational efficiency in influencer agencies. Given influencers' significant impact on consumer preferences and purchasing decisions, effective management of the endorsed products or brands is crucial. This research aims to enhance this process through innovative techniques, such as TF-IDF and SVC Model, addressing manual extraction challenges and offering practical solutions for streamlined influencer agency transactions.

Problem Statement

Manual extraction poses a hindrance to the seamless financial workflows in influencer marketing. To address this efficiency gap, this research advocates for the application of TF-IDF and SVC Model techniques. These advanced methods aim to enhance the accuracy of identifying invoice numbers within influencer agency records. This strategic approach aligns with the industry's escalating reliance on influencers for effective brand promotion, as highlighted by Yeo et al. in 2023. By leveraging innovative techniques, we aim to streamline and optimize the process of managing financial transactions within the dynamic landscape of influencer marketing.

Comparison with existing approach

(Manjari, Rousha, Sumanth, & Devi, 2020) revolutionize text summarization with TF-IDF and Selenium, contrasting with our proposed system's focus on accuracy in influencer agency transaction detail identification. While both utilize TF-IDF, our approach is tailored to address challenges specific to influencer agency transactions, optimizing accuracy in this domain.

(Yang & Long, 2023) enhance association rule mining with TF-IDF, whereas our system utilizes TF-IDF and SVC Model for transaction detail identification in influencer agency records. While both approaches utilize TF-IDF, our focus on accuracy within influencer agency transactions addresses different challenges and aims for specific optimizations.

(Pramono, Rohman, & Hindersah, 2013) modify TF-IDF for topic extraction, contrasting with our system's use of TF-IDF and SVC Model for transaction detail extraction in influencer agency records. While both utilize TF-IDF, our approach addresses challenges specific to influencer agency transactions, optimizing accuracy within this domain.

(Kadhim, 2019) compares TF-IDF and BM25 for feature extraction, whereas our system focuses on accuracy in influencer agency transaction detail extraction using TF-IDF and SVC Model. While both approaches utilize TF-IDF, our system addresses challenges specific to influencer agency transactions, optimizing accuracy in this domain.

(Ngan, Lee, & Khor, 2023) automate paper assignment using TF-IDF and classification, while our system enhances accuracy in influencer agency transaction detail extraction with TF-IDF and SVC Model. Though both employ TF-IDF, our approach addresses specific challenges in influencer agency transactions, optimizing accuracy for this domain.

(Zheng Huang et al., 2019) automate scanned receipt processing, whereas our proposed system aims to enhance accuracy in identifying transaction details within influencer agency records using TF-IDF and SVC Model. While both approaches focus on document analysis, our system is tailored to address specific challenges in influencer agency transactions, optimizing accuracy within this domain.

(Qiang & Zhong-min, 2022) extract symptom information from medical cases using TF-IDF and Word2Vec, contrasting with our system's focus on accuracy in influencer agency transaction detail extraction with TF-IDF and SVC Model. While both utilize TF-IDF, our approach addresses challenges specific to influencer agency transactions, optimizing accuracy within this domain.

(Nugawela, Abeywardena, & Mahaadikara, 2022) extract tabular data from images algorithmically, while our proposed system enhances accuracy in influencer agency transaction detail extraction using TF-IDF and SVC Model. While both focus on information extraction, our approach is tailored to address challenges specific to influencer agency transactions, optimizing accuracy within this domain.

(Qisong et al., 2023) automate NOTAM information extraction using NLP, whereas our system focuses on accuracy in influencer agency transaction detail extraction using TF-IDF and SVC Model. While both aim for automation, our approach addresses challenges specific to influencer agency transactions, optimizing accuracy within this domain.

Alisa et al. (2021) extract immunosuppressive properties from biomedical texts, contrasting with our system's focus on accuracy in influencer agency transaction detail extraction using TF-IDF and SVC Model. While both tackle information extraction, our approach is tailored to address challenges specific to influencer agency transactions, optimizing accuracy within this domain.

(Melnyk, Huymajer, Huemer, & Galler, 2023) propose a digital documentation system for construction management, while our system focuses on accuracy in identifying transaction details within influencer agency records using TF-IDF and SVC Model. While both aim to enhance efficiency, our approach addresses specific challenges in influencer agency transactions, optimizing accuracy within this domain.

(TongNan Huang, 2023) introduces blockchain-based transaction database encryption, contrasting with our system's focus on accuracy in influencer agency transaction detail extraction using TF-IDF and SVC Model. While both aim to enhance security and reliability, our approach addresses challenges specific to influencer agency transactions, optimizing accuracy within this domain.

(Li et al., 2020) propose a quaternary market transaction model for the electric heating market, contrasting with our system's focus on accuracy in influencer agency transaction detail extraction using TF-IDF and SVC Model. While both propose innovative transaction models, our approach addresses specific challenges in influencer agency transactions, optimizing accuracy within this domain.

Ni et al. (2023) introduce FLUID for continuous transaction processing in blockchains, while our system focuses on accuracy in identifying transaction details within influencer agency records using TF-IDF and SVC Model. While both aim to enhance efficiency in transactions, our approach addresses challenges specific to influencer agency transactions, optimizing accuracy within this domain.

(Reddy & Kumar, 2022) compare algorithms for credit card fraud detection, contrasting with our system's focus on accuracy in influencer agency transaction detail extraction using TF-IDF and SVC Model. While both aim to improve transaction security, our approach addresses specific challenges in influencer agency transactions, optimizing accuracy within this domain.

Overall, while existing research proposes innovative approaches to various transaction-related tasks, this research specifically targets accuracy in identifying transaction details within influencer agency records, utilizing TF-IDF and SVC Model tailored to this domain's challenges.

Furthermore, this research use detailed evaluation metrics, including accuracy, precision, F1 Score metrics and processing speed, provide a comprehensive assessment of the proposed solution's effectiveness.

Research aim and objectives

This research is dedicated to the rapid identification of invoice numbers within influencer agency records. Through the application of TF-IDF and SVC Model instead of Cosine Similarity, this approach prioritizes accuracy and confronts challenges linked to manual extraction. The core objective revolves around improving transaction record management, specifically within the domain of influencer marketing, by presenting practical and effective solutions. The incorporation of advanced techniques is geared towards streamlining processes and offering an efficient means of retrieving invoice numbers within the dynamic landscape of influencer agency transactions.

Limitations

This research utilizes TF-IDF and the SVC Model to improve invoice number retrieval, acknowledging certain limitations. The study predominantly evaluates the accuracy and precision of the algorithms with a dataset of 20 samples, raising concerns about the generalizability of findings to real-world scenarios. Variations in actual influencer marketing data may impact outcomes. The evaluation emphasizes efficiency through mathematical metrics, potentially overlooking practical challenges inherent in diverse influencer marketing situations. These limitations underscore the importance of a comprehensive analysis, recognizing the need for broader considerations beyond the specific context of this research.

Hypothesis

Leveraging TF-IDF and the SVC Model for invoice number retrieval is anticipated to significantly enhance accuracy and precision in contrast to manual extraction. The proposed framework aims to achieve heightened efficiency in managing influencer agency transaction records.

Outline of the Research

This study introduces a practical solution aimed at enhancing the identification of invoice numbers within influencer agency transaction records. As the demand for influencer marketing continues to surge, the need for an efficient invoice management process becomes imperative. By leveraging TF-IDF and the SVC Model, this research addresses the crucial task of accurately locating and extracting invoice numbers. Additionally, it confronts challenges associated with manual extraction, evaluating the proposed approach using precision and accuracy metrics. Through a comparative analysis with existing methods in automated information extraction and invoice management, this paper positions itself as a faster and more practical solution for identifying invoice numbers in influencer agency records. The overarching goal is to improve the efficiency and effectiveness of invoice management processes.

 

METHOD

The methodology employed in this research is meticulously designed to bolster the retrieval of invoice numbers in influencer agency transaction records through the synergistic utilization of TF-IDF and Support Vector Classification (SVC) Model techniques. The methodology has been refined based on comprehensive insights from the research details provided above. The outlined steps encompass data collection, pre-processing, TF-IDF calculation, SVC Model implementation, and a thorough evaluation using diverse metrics.

 

1)      Data Collection

The initial step involves gathering a diverse dataset of influencer agency transaction records from the internal database named "Cyclone Management." The dataset encompasses essential details such as including dates, brand names, invoice numbers, amounts, frequency and category.

 

2)      Pre-processing of Data

The collected transaction records undergo a meticulous pre-processing phase. This involves cleaning and standardizing the records to ensure consistency. The pertinent information, including dates, brand names, invoice numbers, amounts, frequency and category, is extracted from the dataset.

 

3)      TF-IDF Calculation

a.       Term Frequency

The TF for each term in the documents stored in the "Cyclone Management" database is computed using the formula:

 

 

b.      Inverse Document Frequency (IDF)

The IDF for each term across all documents in the "Cyclone Management" database is calculated using the formula.

c.       TF-IDF Score

The TF-IDF scores for each term in the sample documents are determined using the formula:

 

 

4)      SVC Model Implementation

The Support Vector Classification (SVC) Model is initialized and optimized using grid search and cross-validation to achieve the best-performing model.

 

a.       Linear Kernel

The linear kernel is defined by the dot product of the input samples.

A black symbol with a white background

Description automatically generated

b.      Radial Basis Function (RBF) Kernel

The RBF kernel, also known as the Gaussian kernel, measures the similarity between two samples based on the Euclidean distance.

A black text on a white background

Description automatically generated

c.       Decision Function Values

A close up of a number

Description automatically generated

In the specific context, the decision function values are calculated for the test set (Xtest) using the trained SVC model (best_svc). These values serve as a basis for predicting the class labels of the test samples. The grid search utilized during the model training optimizes hyperparameters, such as C for regularization, ensuring the SVC model effectively learns and generalizes from the training data to make accurate predictions on new, unseen samples.

 

5)      Predictions and Ranking

a.       Prediction Function

The prediction function applies the trained SVM model (best_svc) to the TF-IDF vectors of the testing data (Xtest). The SVM model utilizes its decision function, which was learned during training, to assign a predicted label to each document in the testing set. The predicted labels are stored in the vectory.

 

b.      Ranking Function

Ranked_indices=

The ranking function operates on the decision function values produced by the trained SVM model (best_svc) for the TF-IDF vectors of the testing data (Xtest). These decision function values represent the model's confidence or certainty about the classification of each document.

The function uses the argsort operation to sort the indices of the testing samples in ascending order based on their decision function values. The result is the vector ranked_indices where each element corresponds to the position of a document in the sorted order. The lower the index, the higher the decision function value, indicating a higher confidence in the predicted category.

 

6)      Evaluation Metrics

a.       Accuracy:

Accuracy measures the proportion of correctly predicted instances among the total instances. In the code, it is calculated by dividing the number of correct predictions (accuracy_score) by the total number of predictions.

A black text on a white background

Description automatically generated

b.      Precision:

Precision evaluates the accuracy of positive predictions. It is calculated by dividing the number of true pos.

A black text on a white background

Description automatically generated

c.       F1 Score:

F1 Score is the harmonic mean of precision and recall. It considers both false positives and false negatives. The code calculates it using the precision and recall scores. The formula emphasizes the balance between precision and recall.

 

7)      System Implementation

a.       Invoice Number Retrieval

The identified documents in the "Cyclone Management" database are scanned, and the invoice numbers are extracted as potential matches for each set of sample data.

b.      Integration into Transaction Record Management System:

The TF-IDF and SVM retrieval system is seamlessly integrated into the existing transaction record management system. This integration ensures compatibility through the design of standardized data formats and interfaces, facilitating smooth interaction with other modules of the system.

 

8)      Processing Speed Calculation

The time taken for the entire process, encompassing both manual extraction and the proposed system, is recorded. The speedup percentage, a measure of efficiency improvement, is calculated using the following formula:

A black text on a white background

Description automatically generated

A higher speedup percentage indicates a more efficient and faster system, showcasing the advantages of the proposed TF-IDF and SVC Model-based approach.

 

RESULT AND DISCUSSIONS

The result and discussion of this research are presented values for the retrieval of invoice numbers in influencer agency transaction records highlight several key areas including evaluation metrics, brand analysis, dashboard visualization and processing speed comparison.

 

1. Evaluation Metrics

A blue square with white text

Description automatically generated

Accuracy of TF IDF on proposed methods is 91,15%, it is relatively provide high correct in predicting influencer agency transaction. Precision stands at 100% it is accurate and reduce the chances of false positive. F1 Score 0.0 the model is struggles to achieve a balance between precision and recall.

 

2. Brand Analysis

It shows result after implement TF-IDF and SVC Model. The data shows including rank, brand name, frequency rank and decision values rank. Consist of 2 table including top 10 brands with highest frequency and top 10 brands with 10 lowest frequency.

 

A table of numbers and letters

Description automatically generated with medium confidence

A screenshot of a table

Description automatically generated

 

Each table includes of relevant information in providing insight to follow up different brands within the transaction records.

 

3. Dashboard Visualization

A close-up of a graph

Description automatically generated
This section show visualization such as top 10 brands based on frequency and information to enhance order frequency.

 

Information are included discount for brand based on their frequency of transactions, discount given has term and conditions such as low and highest get differ discounts only active for 3 months, suggestions given based on conditions the available data on influencer database. This dashboard has aims as data decision making for follow up brand based on transaction records with more targeted.

 

4. Processing Speed Comparison

Invoice number still saved manually it needs to compare it one another it spend 20 minutes, this proposed system only takes 0,44 minutes ~ 1 minutes.

A black text on a white background

Description automatically generated

 

 

Speedup Percentage: [(Time taken for Manual Extraction - Time taken for Proposed System) / Time taken for Manual Extraction] * 100

Speedup Percentage = [(1200 - 44) / 1200] * 100 ≈ 97,81%

Proposed methods are faster rather than manual extraction it is improve 97,81% compare with manual extraction.

 

CONCLUSION

The application of TF-IDF and SVC Model for invoice number retrieval in influencer agency transaction records has significant advancements in accuracy and efficiency. With 91.15% accuracy and a precision score of 100%, it has good performance in accurately identifying invoice numbers. While F1 Score has challenge in achieving a balance between precision and recall. This research provide dashboard for facilitate data visualization as solution to improves transaction record management within the influencer marketing domain especially in follow up brand for next collaboration based their data on invoice record. This integration to enhances brand analysis and order frequency insights. This solution only took 0,44 minutes, it already improved in processing speed which before has 20 minutes it decrease time taken with 97.81% compared with manual extraction. Future research may be maximize the model to achieve balance between precision and recall to enhancing the overall performance. Also exploring integration of Natural Language Processing (NLP) can be areas for further improvements. Overall, this research presents a practical and effective solution to enhance invoice management processes in influencer agency transactions, contributing to rapidly and growing of financial workflows in realm of influencer marketing.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

BIBLIOGRAPHY

 

Huang, TongNan. (2023). Transaction Database Encryption Technology based on Blockchain Technology. 2023 8th International Conference on Information Systems Engineering (ICISE), 342�345. IEEE. doi: 10.1109/ICISE60366.2023.00078

 

Huang, Zheng, Chen, Kai, He, Jianhua, Bai, Xiang, Karatzas, Dimosthenis, Lu, Shijian, & Jawahar, C. V. (2019). Icdar2019 competition on scanned receipt ocr and information extraction. 2019 International Conference on Document Analysis and Recognition (ICDAR), 1516�1520. IEEE. doi: 10.1109/ICDAR.2019.00244

 

Kadhim, Ammar Ismael. (2019). Term weighting for feature extraction on Twitter: A comparison between BM25 and TF-IDF. 2019 International Conference on Advanced Science and Engineering (ICOASE), 124�128. IEEE. doi: 10.1109/ICOASE.2019.8723825.

 

Li, Shupeng, Liu, Yingchao, Ren, Shuai, Huo, Xianxu, Yu, Jiancheng, Li, Qing, & Zhou, Ying. (2020). New Assumption of Regenerative Electric Heating Market Model from the Perspective of Transaction Cost. 2020 Asia Energy and Electrical Engineering Symposium (AEEES), 375�379. IEEE. doi: 10.1109/AEEES48850.2020.9121468.

 

Manjari, K. Usha, Rousha, Syed, Sumanth, Dasi, & Devi, J. Sirisha. (2020). Extractive Text Summarization from Web pages using Selenium and TF-IDF algorithm. 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), 648�652. IEEE. doi: 10.1109/ICOEI48184.2020.9142938

 

Melnyk, Oleksandr, Huymajer, Marco, Huemer, Christian, & Galler, Robert. (2023). Digitalization in the Construction Industry: The Case of Documentation and Invoicing in Tunneling. 2023 IEEE 25th Conference on Business Informatics (CBI), 1�10. IEEE. doi: 10.1109/CBI58679.2023.10187588

 

Ngan, Seon Choon Han, Lee, Ming Jie, & Khor, Kok Chin. (2023). Automating Conference Paper Assignment Using Classification Algorithms Incorporated with TF-IDF Vectorisation. 2023 IEEE Symposium on Industrial Electronics & Applications (ISIEA), 1�6. IEEE. doi: 10.1109/ISIEA58478.2023.10212219

 

Nugawela, Methmi, Abeywardena, Kavinga Yapa, & Mahaadikara, Hansika. (2022). Algorithmically Navigating Complex Tabular Structures in Images for Information Extraction. 2022 3rd International Informatics and Software Engineering Conference (IISEC), 1�6. IEEE. doi: 10.1109/IISEC56263.2022.9998220

 

Pramono, Luthfan Hadi, Rohman, Arief Syaichu, & Hindersah, Dan Hilwadi. (2013). Modified weighting method in TF* IDF algorithm for extracting user topic based on email and social media in Integrated Digital Assistant. 2013 Joint International Conference on Rural Information & Communication Technology and Electric-Vehicle Technology (RICT & ICeV-T), 1�6. IEEE. doi: 10.1109/rICT-ICeVT.2013.6741547

 

Qiang, Cheng, & Zhong-min, Du. (2022). A NLP Application of Automated Symptom Information Extraction from TCM Medical Cases. 2022 IEEE 2nd International Conference on Computer Systems (ICCS), 35�39. IEEE. doi: 10.1109/ICCS56273.2022.9988199.

 

Qisong, Hu, Zixuan, Wu, Wen, Yin, Minrui, Chen, Guoqiang, Chen, & Yang, Yang. (2023). Research on NOTAM Information Extraction of Civil Aviation with NLP. 2023 IEEE 5th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), 520�523. IEEE. doi: 10.1109/ICCASIT58768.2023.10351768

 

Reddy, P. Raghavendra, & Kumar, A. Sivanesh. (2022). Credit Card Fraudulent Transactions Prediction Using Novel Sequential Transactions by Comparing Light Gradient Booster Algorithm Over Isolation Forest Algorithm. 2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM), 2, 563�567. IEEE. doi: 10.1109/ICIPTM54933.2022.9754211.

 

Yang, En, & Long, Zhaohua. (2023). Research on the Weighting Method Based on Tf-IDF and Apriori Algorithm. 2023 IEEE 6th International Conference on Information Systems and Computer Aided Education (ICISCAE), 1003�1005. IEEE. doi: 10.1109/ICISCAE59047.2023.10393523.

 

 

https://jurnal.syntax-idea.co.id/public/site/images/idea/88x31.png� 2024 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY SA) license (https://creativecommons.org/licenses/by-sa/4.0/).