Using Google more efficiently part IV: domain specification

Posted by Greten on 03 May 2013 under Efficient Internet Research

This entry is the fourth part of a serial post discussing how to use Google more efficiently for your research. Click here to get an overall preview and the other parts of this serial post although you may continue reading this entry. You will still understand it without reading the introduction part.

In my last article, I discussed how to search exact phrases and how to use wildcard. For this entry, we will cover how to limit the search results to a specific website, a part of a website, or a top level domain (TLD).

Limiting search results to specific website

Sometimes, we need to restrict the search results to only one specific website. To accomplish this, we need to include the operator 'site:' in the search query. Just indicate the website URL of the website you would like to search, minus the http:// part and the www part if the website has it (some website do not have www), on the right of 'site:'.

For example, you are doing research on solar eclipses and would like to limit your search results on those web pages in the website of NASA. You may encode the following in your search box.

solar eclipse site:nasa.gov

When you click the search button or press the enter key, you will notice that all web pages in the search results are under nasa.gov.

Limiting search to few specific websites

Suppose you want to limit your search to two or more websites. You can do this by using the OR operator and repeating the instances of 'site:' for each website. Suppose we also want to include the results from Space.com and BBC, we can encode our search query as follows:

solar eclipse site:nasa.gov OR site:space.com OR site:bbc.co.uk

The OR operator here is important. Basically, you are telling Google to get those pages that contain "solar" and "eclipse" coming from either nasa.gov, space.com. or bbc.co.uk.

Limiting search to specific section of a website

You may also use the 'site:' operator not just to limit your search to a website, but also to limit your search to a specific section of a website. For example, you may limit your search to a specific subdomain. Just encode 'site:subdomain.domain.tld' in the search box together with your search query. For example:

solar eclipse site:science.nationalgeographic.com

In this example, the actual website address is nationalgeographic.com, and 'science' is merely a subdomain under it.

You may also limit your search to specific folder. Just encode 'site:domain.tld/folder' in the search box together with your search query. For example:

"supply and demand" site:cnn.com/transcripts

If the website has two or more subdomain that have folders with the same name, e.g. archives.cnn.com/transcripts, edition.cnn.com/transcripts. etc., both subdomains will appear in the results. This is also true if a subdomain has a folder with the same name as one of the folders in the main domain.

To avoid this, you may want to specify both the subdomain and the folder, but I think this will be rarely necessary. For example:

"supply and demand" site:archives.cnn.com/transcripts

Excluding a website from search results

Aside from limiting the search, the 'site:' operator can also be used to remove certain websites from the search results. Meaning, all web pages that match the search query, except for those coming from the specified website, will be included in the search results.

For example, you are searching for "supply and demand" but I do not want any articles from Wikipedia. You may encode your search entries as follows:

"supply and demand" -site:wikipedia.org

Notice the dash(-) sign before the 'site:' operator? It is the same negation operator that we use to omit web pages containing certain words. This time however, it is reversing the effect of the 'site:' operator. Instead of including only the pages from the specified website, it includes all matching pages from all existing websites except for the one specified in the 'site:' operator.

You may also use the OR operator to exclude two or more websites. For example:

"supply and demand" -site:wikipedia.org OR -site:khanacademy.org

Limiting search results to specific TLD

You can use the 'site:' operator not just to limit your search or to exclude individual websites, but also whole groups of websites with the same top level domains or TLDs. The TLDs are those that you can find at the end of website address or URL such as .com, .net, .org, .edu, .gov, among others.

In my earlier post about determining the credibility o a website, I mentioned that the websites run by universities and government institutions are usually the most credible. These websites usually end with .edu and .gov respectively. Thus, if there are too many information on the topic you are researching, or you would like to limit your search to educational or government websites, you need to include 'site:edu' or 'site:gov' respectively in the search box.

For example, you are researching about whale conservation and would like to limit your search to government websites. You may encode your search query in the search box as follows:

whale conservation site:gov

Do note however, that the domain .gov is used only by the government institutions of the United States. For other countries, their websites are in the form of gov.cc, where .cc is the country code. Websites of the Philippine government is in the form of gov.ph because .ph is the Philippine's country code. For example, the website of Department of Science and Technology is dost.gov.ph. If you would like to search the websites of Philippine government about whale conservation, you may encode the following:

whale conservation site:gov.ph

If you want to search the websites of government of other countries, just replace ph with the country code of that country. Some other examples of country codes are .jp for Japan, .au for Australia, .ch for Switzerland, .sg for Singapore, .my for Malaysia, and .kz for Kazakhstan.

If you want to search the websites of government institutions about whale conservation, without any regard to country. You can do it like this:

whale conservation site:gov OR site:gov.*

Here, we use the wildcard to denote that after gov., it could be anything. Also, we use the OR operator to include those in the United States because they do not use country code.

The TLDs .com, .net, and .org on a background of Google search resultsThe other TLDs, the .com, .net, .org, .edu, etc. may also be followed by country codes and thus you may follow the same method we discussed on searching government websites to search these TLDs. Just replace gov, with com, net, org, or edu. The thing about these domains, unlike .gov, is that the country code may not indicate the actual country of the owner of the website. Many company websites end with just .com without country code. While most schools in the Philippines have website addresses in the form of .edu.ph, there are also those ending with just plain .edu.

Also, similar to domain names, you can also exclude all websites containing specific TLDs from your search. For example, I want to exclude all websites ending with the domain .com.

whale conservation -site:com

It's the same negation sign that we can use to include specific words, specific websites and this time, we use it to exclude specific TLD.

This is all for now for domain specific and TLD specific search in Google. The next part will be the last part of this serial post and will discuss the two remaining operators that you can use, the file search and the similar search.

Last updated on 03 May 2013. Tags: , , , ,

Share your thoughts

* Required. Your email will never be displayed in public.

Free and open source software technology and internet usage guide for teachers and other professionals in the education sector