Methodology — How RecentScam Aggregates Threat Intelligence

1. Where our data comes from

RecentScam aggregates scam-indicator data from established, publicly-accessible threat-intelligence feeds and community reports. We do not generate the underlying claims ourselves. Every record on this site traces back to one of the upstream sources below, and each record page displays its source attribution and (where available) a direct link to the original record.

Threat-intelligence feeds (domains)

URLhaus (abuse.ch) — community-curated database of malicious URLs used in malware distribution and phishing.
OpenPhish — verified phishing URLs identified through automated and manual review.
PhishTank — community-verified phishing reports operated by Cisco Talos.

Government complaint databases (phones)

FTC Consumer Sentinel Network — U.S. Federal Trade Commission Do-Not-Call complaint API.
FCC Open Data — U.S. Federal Communications Commission unwanted-calls complaint dataset.

Community reports

Publicly-posted scam encounters from Reddit communities (r/Scams, r/phishing, r/fraud) where users share specific phone numbers, emails, and URLs they have encountered. We extract identifiers via regex and preserve a link to the original post.

Direct submissions

Reports submitted directly by visitors via our Report a scam form. These are held for manual review before publication. We do not auto-publish anonymous third-party accusations.

What we do NOT use: we do not buy data brokers' "people-search" databases, do not republish unverified claims from anonymous tip lines, do not generate or invent identifiers, and do not include records from sources that cannot be cited by name.

2. How often the database updates

Upstream feeds are queried on a scheduled cadence, typically once per day. New records appear in the database within hours of being added to the source feed. Existing records are updated when fresh observations are added upstream (for example, a URLhaus URL changing from online to offline).

The "Database timeline" line on each record page shows when we first observed that identifier in our own database. The "First listed on URLhaus" or "Submitted to PhishTank" line, where present, shows when the upstream source first observed it.

3. Per-page analysis we add on top of the source data

Each record page combines the upstream source data with a layer of deterministic analysis we compute from the identifier itself. This includes:

Phone numbers: area-code geography (NANP lookup), number-type classification (toll-free / premium / geographic), digit-pattern flags (repeating, sequential, palindrome), and recognition of frequently-spoofed area codes per FCC enforcement records.
Domains: TLD-risk classification based on Spamhaus and Interisle Phishing Landscape rankings, brand-impersonation pattern detection against a list of ~40 commonly-impersonated brands, and structural anomaly flags (length, hyphen count, digit density).
Email addresses: free-webmail-provider detection, suspicious-prefix patterns (noreply, alert, security, etc.), and the same domain-level analysis applied to the sender's domain.

These derivations run synchronously from public lookup tables. No third-party API calls are made at request time, no AI models generate the analysis, and the same inputs always produce the same outputs.

4. How the risk score is calculated

Each record carries a risk score from 0 to 100. The score is set when the record is created based on:

Upstream source authority — feeds with stronger verification processes (FTC, FCC, URLhaus, PhishTank) contribute higher base scores than community-reported identifiers.
Threat type — malware-distribution domains and confirmed-robocall phones score higher than unclassified-spam records.
Corroboration — identifiers appearing in multiple independent sources score higher than single-source listings.
Direct user reports — community-submitted reports through this site contribute to the score, weighted by reviewer trust.

The risk score is an internal heuristic, not a regulated rating. It is intended to help readers quickly compare records, not to substitute for their own judgement. A high score does not constitute legal fact or a finding of fraud.

5. Editorial standards

Source attribution per record. Every record page names its upstream source and links to the source record when one exists. We do not publish records whose source we cannot disclose.
No invented identifiers. We do not generate synthetic phone numbers, emails, or domains. Every value on the site originates in a real upstream feed entry or user submission.
Conditional language. When a record has no direct user submissions, we do not claim that "users have reported" the identifier. The page distinguishes between threat-intelligence-feed listings and community-reported listings.
Legitimate-platform exclusions. Domains belonging to known legitimate platforms (such as code-hosting sites, file-sharing services, URL shorteners, and government/educational TLDs) are excluded from publication even when malicious activity is hosted within them. We do not label github.com, dropbox.com, or similar platforms as scams.
Public-interest framing. Records are published as threat-intelligence aggregations for consumer protection, not as defamatory claims about specific persons or businesses.
AI assistance disclosure. Some descriptive evidence text on record pages is drafted by language models from the underlying source data. This is editorial assistance, not source data. The underlying identifiers, source attribution, timeline data, and structural analysis are not AI-generated.

6. Dispute and removal process

Anyone identifiable as the owner of a phone number, email address, or domain that appears on this site can request its removal through our removal request form.

We grant removal requests on the following grounds:

Proven inaccuracy: the identifier was flagged in error and you can demonstrate the listing is a false positive.
Ownership change: you recently acquired a phone number or domain previously used by a different party.
PII exposure: a record contains private personal data not relevant to the underlying threat warning.
Legal order: a valid court order requires removal.

Standard response time is 72 hours. Removal requests are reviewed manually. Approved removals delete the record and prevent it from being re-added by subsequent ingestion runs.

7. Known limitations

We publish this honestly so visitors can calibrate the weight they give our data:

Source feeds aggregate reports from many parties and are not always 100% accurate. We inherit any false positives upstream.
Phone numbers in caller ID can be spoofed, meaning a flagged number may not correspond to the actual caller. The presence of a number on this site does not establish that the listed number is the originating device.
Threat intelligence is time-sensitive. A scam campaign typically lasts hours to days. A historical listing does not mean the identifier is currently active.
Community-submitted reports are reviewed before publication but cannot be independently verified at scale.
The risk score is a heuristic. It is not a legal finding, a credit-style rating, or a guarantee of any kind.

8. Contact and corrections

For press enquiries, methodology questions, suspected inaccuracies, or partnership requests, contact us through the contact form. For record removal specifically, use the removal request form. We publish a changelog of methodology updates below.

9. Methodology changelog

26 May 2026: Added rich source-metadata capture (threat tags, reporter handles, complaint geography). Added legitimate-platform exclusion list. Initial publication of methodology page.

How we collect, classify, and verify scam reports