A shadowed third party of the ubiquitous language

Alex Tatulchenkov
3 min readMay 4, 2021

Ubiquitous Language is the term that Eric Evans used in his book “Domain-Driven Design — Tackling Complexity in the Heart of Software” to describe a language shared by the team, developers, domain experts, and other involved stakeholders.

The main thing regarding ubiquitous language — there should be no ambiguity in its terms. Usually, it means that a developer and a domain expert should refer to the same business item when they use the same term from the ubiquitous language. But let's take a look at the edge cases from the technical perspective.

Assume we are building a web application for sharing links on any interesting stuff on the internet using PHP language. So the key term here will be an URL. This term will be used in a user story:

As a user, I should be able to share valid links (URLs). Invalid links should be rejected

Let’s imagine what people usually assume when they say a valid URL:

In addition, there is a third party that will be always involved in all scenarios — a PHP interpreter.

PHP interpreter his own opinion, or a few opinions:

we can validate a URL with filter_var($url, FILTER_VALIDATE_URL) or parse URL into parts with parse_url($url) and these functions have a different understanding regarding what is a URL. e.g. filter_var doesn’t support UTF-8 symbols in URL and treats https://мой.домен.бел as invalid.

There is a suggestion to check only that a scheme and a domain name parts are correct:

parse_url($url, PHP_URL_SCHEME) && parse_url($url, PHP_URL_HOST);

Please see the results of the tests to see the difference in the behavior of filter_var and parse_url.

So, it looks like we can’t rely on internal PHP functions to validate URLs, can we use any famous lib instead?

There is a well-known beberlei/assert lib but in fact, it reuses validation rules from the Symfony framework. Symfony assumes that the URL is a string that can be matched by the regexp:

As you can see from the comments it even matches IPv6 addresses but doesn’t match http://t_rost.rosfirm.ru/ which is a valid (or not :-) )URL because you can open it in your browser, it points to a real website. At this point, we realize that there is a web server that has its own understanding regarding the validity of the URLs. This happens because there are a number of RFCs that clarify what is a URL:

RFC 1738: Uniform Resource Locators (URL)

Section 3.1: Common Internet Scheme Syntax

RFC 1034:

Section 3.5: Preferred name syntax

RFC 2181: Clarifications to the DNS Specification,

Section 11: Name Syntax

RFC 3986: Uniform Resource Identifier (URI): Generic Syntax

Section 3.2.2: Host

Even more, all URI schemes have their own RFC’s (or multiple RFC’s) and some of them need extra support (for example IDNA/Punycode for host names, etc.)

Ok, what about a browser? As always, different browsers may have their own understandings and different versions of the same browser may implement different RFCs. As an example — Internet Explorer won’t allow cookies on subdomains with underscore(s).

From the human perspective in most cases you can say that if the URL is valid, but did you hear about DWord? The IP address is translated in an equivalent 16bit number:

http://2728368388 is http://162.159.153.4/ (one of the medium.com IP addresses)

What about Hexadecimal representation of IP — http://0xd83ad74e , is this clear that it points to http://216.58.215.78/ (google.com)

Are you tired? What about Octal notation?

The same IP can be represented as: http://0330.0072.0327.0116.

Did I mention that in these notations you may add/omit dots? So http://033016553516 is the same as http://0330.0072.0327.0116 .

If it’s not yet insane for you then:

You may combine all the notation in a single URL. Super COMBO Hybrid notation!

http://0xd8.0072.55118 (Hexadecimal Octal DWord)

Conclusion:

Do not rely on generally accepted meanings of terms, always specify strict validation rules for each term in a ubiquitous language.

P.S. If you want to play with different URL representations you may use https://www.browserling.com/tools/ip-to-dec

--

--

Alex Tatulchenkov

Senior Software Engineer at Intetics Inc., AppSec Manifesto evangelist