Sunday, 15 March 2026

Curriculum Vitae (CV) Scoring

Curriculum Vitae (CV) scoring systems are becoming increasingly popular as a way to help recruiters identify the most relevant candidates for a given Job Description (JD). With the rise of online applications, it is common for a single job posting to attract hundreds of CVs. Given the limited capacity of recruiters, manually reviewing each submission is impractical. In many cases, only a small fraction — perhaps 10 out of 100 or more — are assessed, while the majority are overlooked. This creates inefficiencies and risks unfair hiring outcomes.

A CV scoring system addresses this challenge by automating the initial screening process. Each CV is evaluated against the JD, generating a relevance score that allows recruiters to quickly identify the strongest candidates. By streamlining the first stage of assessment, such systems ensure that more applicants receive fair consideration, while recruiters can focus their time and expertise on deeper evaluation of the most promising profiles.

The following steps outline how a simple CV scoring system can be implemented. In practice, real-world systems are considerably more complex, but this simplified model serves to illustrate the core concepts. For the sake of demonstration, we assume that each CV contains three primary sections: Education, Skills, and Experience. It is important to note that the technical process of parsing CVs — typically submitted in formats such as Word or PDF — is not covered here. Instead, the focus is on the scoring logic applied once the relevant information has been extracted.

Step 1: Transform texts in CV and JD into embeddings so the system understands the meaning and not just keywords. An 
embedding is simply a series of numbers. Each number in the embeddings does NOT necessarily map directly to any word in the CV/JD.

Example of JD and CV embeddings:

JD embedding: [ 0.12, -0.45, 1.32, 0.88, ... ]
CV embedding: [ 0.10, -0.42, 1.29, 0.90, ... ]

Following is a Python example of converting a sentence into embedding using the all-MiniLM-L6-v2 model.

from sentence_transformers import SentenceTransformer

# Load a pre-trained model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Your sentence
sentence = "Led a large scale enterprise IT project."

# Convert sentence into embedding
embedding = model.encode(sentence)

print("Sentence:", sentence)
print("Embedding shape:", embedding.shape)
print("Embedding vector:", embedding)

The output is as follows. A series of 384 numbers are returned. Due to the huge volume, only part of the screenshot is shown.








It is important to note that the embeddings generated are dependent on the specific AI model used. When the model is retrained, the resulting embeddings can differ, as the training process influences how the model represents and interprets data

Step 2: Compute relevance based on embeddings so that the system knows how similar a CV is to a JD but not yet how suitable the CV is. Cosine Similarity is one of the most common methods used to compute relevance.

Example of calculating relevance scores using Cosine Similarity:

JD embedding = [ 1 , 1 ]
CV embedding = [ 2 , 2 ]
Cosine Similarity Score = 1 (almost similar)

JD embedding = [ 1 , 1 ]
CV embedding = [ -1 , 1 ]
Cosine Similarity Score = 0.2 (weakly related)

JD embedding = [ 1 , 1 ]
CV embedding = [ 1 , 0.3 ]
Cosine Similarity Score = 0.7 (moderately similar)

Cosine Similarity Score

Meaning

Close to 1.0

very similar meaning

Around 0.6 to 0.8

moderately similar

Near 0

unrelated


A negative Cosine Similarity score indicates that the CV content is not just unrelated, but strongly inconsistent with what the JD describes but this occurrence is very rare. 

A typical CV scoring system usually refers to each section (i.e. education, skills, experiences) in the JD and CV to compute a Cosine Similarity score for each section in the CV.

Following is a Python example of computing Cosine Similarity score for a text in the CV against the text of similar context in the JD which in this case, is the education.

from sentence_transformers import SentenceTransformer, util

# Load a local embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

sentence1 = "Bachelor/Master of Science in Computing"
sentence2 = "Bachelor of Science in Information Technology"

# Generate embeddings
s1_embedding = model.encode(sentence1, convert_to_tensor=True)
s2_embedding = model.encode(sentence2, convert_to_tensor=True)

# Compute cosine similarity
similarity = util.cos_sim(s1_embedding, s2_embedding)

print(f"Similarity score: {similarity.item():.4f}")

The output is as follows.

Similarity score: 0.7290

It is important to note that computing Cosine Similarity is purely based on a fixed mathematical formula, so the results remain consistent. In contrast, converting text into embeddings depends on the specific AI model used, and retraining or changing the model can produce different embeddings.

Step 3: Perform scoring using a supervised model to determine the final score.

Formula to calculate raw score for each CV:

X = (w1 * education_similarity) + (w2 * skills_similarity) + (w3 * experiences_similarity) + bias

Weights (denoted by w) are numbers that tell the system how important each section’s Cosine Similarity score is when computing the CV final score. The bias is the intercept of the CV scoring model - it sets the baseline score before section's Cosine Similarity score is applied, reflecting overall hiring tendency and correcting for real‑world data imbalance. Think of the bias as the probability of a CV getting shortlisted before it is assessed by recruiters and this value is applied to all CVs. If the objective is simply to rank all the CVs based on their final score, the bias can be ignored (defaulted to 0) as it would not affect the ranking in any way. However, if there is a pre-determined threshold score for shortlisting CVs, the bias would be relevant.

After calculating the final score for each CV, a Sigmoid function may be applied if the range of the CV final scores is too large which makes it difficult to present on a graph for the purpose of reporting. This step is optional is not necessary if the purpose is solely on scoring the CVs.












Sample Use Case

Following data from the JD and 3 CVs are stored in a JSON file individually and input into the prototype (implemented using Python).

Section

JD

CV A

CV B

CV C

Education

Bachelor of Science in Computer Science or related field

Bachelor of Science in Information Systems

Master of Science in Information Systems

Bachelor of Arts in Multimedia Design

Skills

·       Python

·       Java

·       C++

·       HTML

·       CSS

·       JavaScript

·       React

·       Database management

·       Problem-solving

·       Team collaboration

·       Python

·       Java

·       HTML

·       CSS

·       Problem-solving

·       Python

·       JavaScript

·       React

·       Database management

·       Team collaboration

·       HTML

·       CSS

·       UI/UX design

·       Figma

·       Basic JavaScript

Experiences

·       Software Engineer or Developer role, 2+ years

·       Experience with web application development

·       Internship or project work in software systems

·       Volunteer Tutor, Community Center, 2022

·       IT Support Analyst, TechCorp, 2021–2023

·       Junior Developer, WebSolutions, 2020

·       Front-End Developer, CreativeApps, 2021–Present

·       Web Designer, Freelance, 2019–2021



Suppose the weights are defined as follows: Education = 20% (0.2), Skills = 50% (0.5), Experience = 30% (0.3), with the bias term set to zero. Under these conditions, the scoring results are obtained as follows:
  • The values in the "Education", "Skills", and "Experience" columns represent the Cosine Similarity scores calculated for each section.

  • The "Total Score" column is derived using the weighted formula described in Step #3 above.

  • The "Norm Total Score" column is produced by applying the Sigmoid function to the total score, ensuring the results are mapped to a bounded range for easier comparison.

CV

Education (0.2)

Skills (0.5)

Experiences (0.3)

Total Score

Norm Total Score

A

0.759

0.457

0.283

0.465

0.614

B

0.560

0.850

0.486

0.683

0.664

C

0.547

0.338

0.570

0.449

0.611


The results obtained are interpreted as follows.
  • CV B achieves the highest total score because it contains the largest number of skills that align with the JD. Since the Skills section carries the greatest weight among the three categories, this strong alignment significantly boosts CV B’s overall score.

  • CV A receives a higher education score than CV B because the JD explicitly requires a Bachelor degree. Although CV B lists a Master degree in the same field (Information Systems), the Cosine Similarity formula assigns a stronger match to CV A, since its education level exactly corresponds to the requirement stated in the JD. In effect, this implementation penalises CVs that exceed the specified education level, reflecting a reality in real-world hiring decisions. While higher qualifications may be valuable, automated scoring models often prioritise strict alignment with the stated criteria.

The explanation above reflects my personal understanding and simplified implementation of a typical CV scoring system. I welcome any feedback or comments you may have. If you are interested in obtaining a copy of the Python code used in this implementation, please feel free to leave a comment as well.