Collection of information from open sources. Sources of information, methods of searching and collecting data on the Internet Using the Internet to collect sources

INTRODUCTION

The Internet is like a huge world library, which has only one, but significant difference: to search for a book in the library there is a catalog, in extreme cases, you can turn to an experienced librarian. There is no complete catalog of the Internet. But, nevertheless, search in the global computer network is possible, and this, perhaps, is one of the most important aspects of it. To search for data in the network, special servers are used, the information on which is maintained and updated almost automatically.

Today, when the Internet has become one of the main sources of information, Internet search is gaining more and more practical value. Nose rapid increase the amount of available data, the search procedure itself becomes more and more complicated.

The Internet is a global computer network that connects both computer network users and PC users. The Internet is slowly but surely becoming the main means of corporate communication, giving way so far to the telephone.

There is a huge amount of information resources on the Web. According to some estimates, the number of documents has exceeded 65 million and continues to grow rapidly. Such a volume of information requires the correct organization of the search process and the use of special technical means, such as search engines... A simple search for a fairly common keyword usually yields from tens of thousands to several million links. Obviously, working with such big amount documents are almost impossible, especially since the overwhelming part of them contains information that is not relevant to the case.

Sources of information on the Internet differ in the way information is presented and, consequently, in the method of access to them.

1 SEARCHING TOOLS

1.1 File search tools

Finding the file manually in the complex directory structure of the ftp server can take a long time. To simplify and speed up the search, the Archie Internet search service was developed, which is a special Archie-server that stores the contents of directories of anonymous ftp-servers. When addressing a search query on an Archie server, the search result is a list of addresses of anonymous ftp servers on which the required file is located.

But the task arises to find the desired one among the set of files on this server, which is quite difficult due to the unclear and incomprehensible names of files and directories. To solve this problem, the Gopher system is used, which allows you to navigate the system context menus showing the contents of files using understandable notation. There are many Gopher servers that contain archives of data in the form of hierarchically structured directories, ordered by content. Working with them is very simple and corresponds to working with a normal display. file system.

There is an extension of this system - Veronica, which contains directories of all Gopher-servers in its database. After entering a search query, Veronica automatically scans all Gopher directories for the information you are looking for, thereby eliminating the need to manually search many Gopher servers.

With this way of navigating, the Gopher was to a certain extent the forerunner of the WWW. Currently, the use of Gopher is decreasing in proportion to the increase in the use of the WWW.

1.2 WWW Tools - WorldWideWeb

In 1993, the WWW information retrieval system was developed, which, due to its ease of navigation and accessibility, opened information sources of the Internet to unprepared users. The WWW has triggered an Internet boom that continues to this day, and the amount of information available on the Internet is doubling each year.

WWW is based on the principle of hypertext (already familiar to the reader), that is, on a system of documents linked by hyperlinks. Hypertext is keywords, specially selected from ordinary text. Hypertext links send the user to other documents on the same server or to other servers that may be located anywhere on the Internet. If this text document is also hypertext, then its links allow you to go further to the corresponding documents. Each redirection occurs imperceptibly for the user, so that he can view the information content of the Internet in a meaningful way, without worrying about addressing specific computers.

With the development of multimedia applications, initially purely hypertext documents are increasingly becoming hypermedia. Thus, WWW documents can exist in any data format: text, graphics, sound / music, or video clip. Orientation and navigation in the World Wide Web occurs using special programs called WWW-browsers that provide a user interface, such as NetscapeNavigator or MicrosoftInternetExplorer.

The starting point for searching for information is, as a rule, the main (base, home) page (site) of the information resource, which can be reached by entering the appropriate address in the browser (for example, http: //ncpi.gov.by or www.iparegistr.com). WWW-sites are created and updated by companies or special organizations that publish information and monitor the content of their WWW-pages. The use of the WWW, therefore, is not passive, and each Internet user with the help of special hypertext editor programs can independently create their own interactive WWW-pages. This paved the way for the growing commercialization and expansion of the Internet.

At present, the newly created information, as a rule, is made taking into account the need to provide WWW-access, and earlier documents are gradually being converted for it, but around the world there are still millions of files in forms other than WWW requirements. For the use of this information and through the WWW, the above Internet services are included in the browsers that provide access to it (telnet, ftp, Archie, Gopher). Through the WWW, you can also use other Internet services that are intended for communication (eMail, NetNews). Therefore, the WWW browser has now become a universal communication program for the Internet.

With the advent of the WWW service, the Internet boom began. This easy-to-use, user-friendly environment for all services has attracted the interest of many people and organizations on the Internet. It suddenly turned out that you didn’t need to be an Internet expert to use the services of the network. This can be compared to Microsoft's success in releasing Microsoft Windows as a graphical user interface. Before Windows appeared, each DOS application had its own user manual and thus required each application to be studied separately.

2 BASIC TECHNIQUES OF FINDING INFORMATION ON THE INTERNET

2.1 Basic requirements for search

Requirements are imposed on the search results for the completeness of the coverage of resources, the reliability of the information obtained, the minimum expenditure of time and the maximum search speed.

The requirement for the completeness of the coverage of resources does not need additional explanations, except for the need to use resources not only for the WWW, but also for other Internet services when searching.

The reliability of information, given the nature of the Internet, is becoming an extremely important requirement. The assessment of reliability can be carried out both by traditional methods (checking the legality of publications on paper, obtaining information about organizations and authors, finding out the validity of their electronic resources, etc.), and using the possibilities of the Internet (familiarization with alternative sources of information, reconciliation of factual material , establishing the frequency of its use by other sources; finding out the status of the document and the rating of the source by means of search engines, obtaining information about the competence and status of the author of the material using special search services The Internet; analysis of individual elements of the site organization in order to assess the qualifications of the specialists supporting it, and more).

Search time, not counting the time spent on connection specifications, is largely dependent on the search planning and the search engineer's skill with the type of resource of the selected resource. Search planning is about determining the search services required to resolve a search request and the order in which they are applied. In addition, a lot depends on the skills and experience of the individual search specialist.

As already noted, information on the Internet is available from various types of sources. First of all, these are WWW-resources (hypertext system, resource catalogs, search engines). In addition, it is already known to the reader e-mail, mail robots, Usenet and other newsgroups, as well as ftp-systems and archives (using Gopher and Veronica). WWW allows you to search for the required resources based on its hyperproperties, that is, the available search engines work with the use of hyperlinks in automatic mode, without excluding the possibility of manual viewing. The WWW has a number of general and specialized search services.

Resource directories are databases with Internet resource addresses and a variety of topics. They usually have a hierarchical structure familiar to the user, and some means of searching through it. For the most part, these catalogs are served by classification specialists, that is, a certain subjective approach to the selection of information is predetermined, which, on the one hand, somewhat guarantees the reliability of information, but on the other hand, predetermines the possibility of the absence (omission) of some information, as well as all late placement in the directory.

Search engines are a mechanism for automatically building links (indexes) to various resources. Search engines can target global, specialized, or local resources. In fact, they are powerful IRS, which, with the help of special robotic programs (the so-called "spiders"), constantly carry out automatic search required information on the Internet. Specialized databases created on this basis provide information retrieval based on user requests based on special IPLs. True, the coverage of the viewed information depends on the algorithms used and even for powerful search engines leaves much to be desired.

Electronic mail is used on the Internet and the WWW. In this case, the addresses go to search engines and are available to search engines.

Mail robots are special programs capable of responding with certain actions to commands received by them, but by e-mail. Their main purpose is to send data on demand when they are not available in any other way, as well as as an alternative to working online with any of the known resources, for example ftp archives. The mail robot's address is in email format. When searching, mail robots are usually used only as intermediaries in obtaining information. Sometimes you have to deal with the fact that they are the only means of obtaining the necessary information.

Usenet and other regional and specialized newsgroups are electronic "bulletin boards" where the user posts their information in one of the thematic newsgroups transmitted to subscribers of the relevant topic. This resource is most significant for the rapid accumulation of information, but a narrow issue, and when searching - more often for obtaining private, unofficial information.

The resources available via telnet, in some cases, represent completely unique information, primarily on the library catalogs of European and American universities, as well as government agencies.

As already noted, the ftp file archive system has a fairly extensive resource of valuable information that has not yet been translated into the WWW. Ftp archives are primarily sources of software... Searching them can be of some interest if you know the structure of the archives; building file systems, filenames and directories containing the required resources.

2.2 Methods for finding information on the Internet

You can search for the information you need on the Internet different ways:

Search with search engines by keyword

Search with search engine classifiers

Directories and collections of links (more general concepts)

Conferences, chats

· Pages of links ("Links") on thematic sites (rare, specialized items)

Non-network methods (advice from friends, acquaintances; advertising in print media)

At the beginning of the search for information, it is necessary to determine its type. Conventionally, 4 types of information can be distinguished.

1 type - general (for example: history of the Russian Empire),

Type 2 - less general (for example: Emperor Alexander II),

Type 3 - specific (for example: the reforms of Alexander II),

Type 4 is more specific (for example: the abolition of serfdom).

Search paths are also determined depending on the type of information.

Type 1 information is searched for using search engine classifiers (from Russian - Yandex www.Yandex.ru is recommended). If the sites with the required information are not immediately found, then you should look through the catalogs and pages of links ("Links") found by the classifier, which are located on sites of a similar subject. These sites are listed in the classifier by topic and found directories.

Type 2 information is searched similarly to Type 1 search, but with the advantage of searching through directories and link pages.

Information of 3 types - by keywords that are entered in the search bar of search engines, catalogs, link pages

Information of 4 types - according to the detailed data that is entered in the search bar. The data are found according to the search methods outlined for types 2 and 3.

Search by type 1. Required information: "History of the Russian Empire".

We go to Yandex - Science and Education / Social Sciences / History. According to the description of the topic, we find the site http://rus-hist.on.ufanet.ru .. If it does not contain the necessary information, then go to the links page of this site. It contains links to resource catalogs: www.history.ru, http://www.lants.tellur.ru/history/index.htm. In them, most likely, sites on a given topic will be found.

Search by type 2. Required information: "Emperor Alexander II".

The search is carried out similarly to the previous one, but more attention is paid to working with catalogs www.history.ru, http://www.lants.tellur.ru/history/index.htm.

Search by type 3. Information required: "Reforms of Alexander II"

Here appears new way search - by keywords. We write in the search line of Yandex "Reforms of Alexander II". Result for viewing - 1790 pages, which are located on 170 sites, which include directories. To narrow down the information, you can add new keywords - additional facts in the already found selection of sites, for example: "1860-1870." etc. In other search engines, the entire "Reforms of Alexander II in 1860-1870" is typed. To search for the specified information, you can also use the "Links", which are given on the found sites

2.3 Development of an information resource

Like other information technologies, the Internet is created by developers, but in this case it is mainly the creators of resources (starting from specialists who support hard- and software, designers, artists, editors, and most importantly, the authors of information resources). Naturally, the creation of resources is not an end in itself, resources are in demand by network users, that is, by the same specialists and resource consumers, among whom, as already noted, a new layer appears - specialists in datamining, in information search. Information resources of the Internet, as well as others, including non-electronic information resources (in particular, the mass media), are characterized by certain states of their activities (Fig. 9.3).

The resource originates in accordance with the needs of society and its capabilities (in particular, those related to the level of the technical and social state of society).

As far as possible, there is a "maturation", the formation of a resource (or its disappearance in the absence of demand, that is, disappearance, perhaps not in the physical sense - the site can exist, namely in the sense of being in demand).

At a certain level of demand and (including the efforts of the site's authors), it is cataloged, that is, information about the resource appears in various directories corresponding to the type of resource.

Indexing, that is, the appearance of a resource in the indexes of search engines, occurs when a certain amount of information content and demand are reached.

If there is a constant growth in demand, the resource is constantly developing, otherwise the resource dies out and gradually disappears from the indexes and catalogs.

2.4 Requirements for search tools

As noted earlier, the inherent features of a professional search are its completeness, reliability and high speed. The most serious and non-trivial factor determining the speed of achieving the search goal is the planning of the search procedure. This requires, on the one hand, the choice of the type of resources that are potentially capable of carrying information relevant to the search task, and on the other hand, the selection of search tools serving the corresponding information field, depending on their expected performance. If we talk about the most capacious for today, from the point of view of information content, the WWW-space, then the relative abundance of its search facilities makes the solution of most practical problems multivariate. The construction of the optimal sequence for the use of certain tools at each stage of the search and predetermines its effectiveness. A clear idea of ​​the types, purpose and features of the work of information retrieval systems (ISS) of the Internet can help solve the problem of choice.

Search engines and catalogs are real carriers of information about the resources available to the Internet. Information retrieval systems of the Internet differ, but the principle of selection of information, which to one degree or another is present in the scanning program of the search engine, and in the activities of specialists performing cataloging. As a rule, two main indicators are distinguished: the spatial scale of the system and its specialization.

When forming the information array, the search engine can keep track of the updating of a predetermined set of documents, catalogs or a finite number of nodes selected according to some principle. Such systems, implemented on the Internet, can be somewhat conditionally called local and. Global search engines, in contrast to local ones, solve a more laborious task - the fullest possible coverage of the resources of the entire informational field of the Internet (WWW or other) that they serve. The consequence of this is the increasing role of the mechanism used by such a system to constantly increase the number of viewed sites.

The construction of regional and specialized search services presupposes active filtering of information. Specialization of a search engine based on a particular profile OR topic, be it a legal focus, search for personalities or multimedia files in MP3 format, can occur both on a global and local scale. Of course, the system is easier to build and maintain in the limited space of updated sites, which is usually implemented in practice.

Regional search services filter information mainly by the server top-level domain name, for example, by for Belarus, ru - for Russia. A serious drawback of such systems is the lack of accounting for a large number of resources posted by regional resource authors directly in the com domain.

Consideration of regional peculiarities is often present in global search services. The Lycos system, for example, ranks responses by request region.

By its nature, the Internet is accompanied by information chaos. And only modern means of automatic indexing of documents are capable, taking into account the algorithms used and the capabilities of technical means, to find a rational grain in this chaos. The use of resources when searching for resources without searching by keywords is reminiscent of surfing, and not serious work with information.

2.6 Global WWW Search Engines

After getting acquainted with several global search engines, the user, as a rule, stops at one or two, with which he prefers to work in the future. At the same time, the choice of a search service often occurs in a completely arbitrary way, not on the analysis of the actual capabilities of the systems, but on their popularity. One of the largest and most popular is AltaVista. The AltaVista system has a flexible query language, which, however, requires special study. AltaVista has multilingual support for the search index and the ability to translate on-line (that is, directly during a session) the text of a Web page from common European languages ​​into English.

Another well-known system is NorthernLight, which has a fairly standard set of functions. The system additionally makes it possible to work with a unique collection of links (more than 6 thousand), mainly to articles from periodicals. Index support of the Cyrillic alphabet (including the Russian language) makes it, together with AltaVista, a good addition to the regional Russian search engines Rambler, Yndex and Aport for Russian-language search.

Finding and gathering information on the Internet needs planning. Erroneous logic of building a query, unoptimized sequence of using search tools, attempts to speed up the search - all this not only delays getting the result, but can jeopardize the meaning of search work.

Let us dwell on several important points related to planning and the first steps of such work.

It is necessary to start with a comprehensive lexical analysis of the required information. Any sufficiently reliable and detailed description of the investigated issue should be used to obtain primary information. Such a source may well be both a highly specialized reference book and an electronic encyclopedia of a general profile. On the basis of the studied material, it is necessary to form the widest possible set of keywords in the form of separate terms, phrases, professional vocabulary, slang, cliché words and stable verbal stamps, if necessary in several languages. Possible refinements of the search query should be determined in advance - rare words, synonyms and antonyms. names and surnames closely related to the desired question. It is also desirable to anticipate possible irrelevant responses to queries, that is, possible characteristics of the search noise, in advance. After the accumulation of these preliminary data, you can proceed to obtaining primary information from the Internet.

The main task of this stage is to take into account the peculiarities of the Internet, which is not only a carrier of technologies, but also traditions and its own ethics. Network vocabulary, slang and the spelling of common words here may differ from the accepted ones.

It is best to look for information about the availability of the required data on the Internet in a previously known directory that supports keyword searches. When solving, for example, simple tasks such as "Get the text of the Constitution of the Republic of Belarus" or "In which legal acts is the name of the hometown used?" fast way obtaining information than an automatic index, and will provide greater reliability.

After the lexical analysis of information, the technological stage begins. The choice of the information field of the Internet and search tools is based on the above approaches.

Test queries from one or two keywords or phrases are used, then the quantitative response is analyzed. Content analysis of the data allows you to adjust the queries, but the relevance of the response. As a result of testing, the most representative sources of information are identified, after which the sequence of using search tools should be clarified. This concludes the planning phase.

In conclusion, we note that in solving the problem of collecting information from the Internet, regional and specialized search services play a significant role. The use of global indexes not for direct search of the necessary information, but for the localization of these search tools, often makes it possible to shorten the time required for solving a given search problem.

CONCLUSION

Taking into account all of the above, we can try to define the essence of the Internet in one word: it is communication, communication between individuals and entire nations without the intervention of government authorities. This new technology is changing the face of civilization at a tremendous speed, fundamentally changing the idea of ​​mankind about the world and itself. The Internet has already absorbed tens of millions of people, more than a hundred countries, it has completely changed the processes of dissemination and perception of information. In our century information technologies virtual reality Internet, contributing to the erasure of state borders, reduction of geographical distances, eliminating barriers between cultures , becomes no less obvious than the material world around us.

With the development of INTERNET, it became possible to quickly and conveniently search for the necessary documentary information. Now you don't have to engage in the selection and study of a huge amount of literature in bookstores and libraries. Information can be obtained without leaving your home or office. To do this, you only need a computer directly connected to the INTERNET with a special program installed - a browser designed to view the content of Web pages.

Thanks to the variety of search engines specially designed for the average user, everyone can easily cut off the obviously unnecessary flow of information, only by correctly formulating the purpose of the search.

LIST OF USED LITERATURE

1. Grinberg A.S., Kashinsky Yu.I., Slavin B.S. Introduction to Legal Informatics. Minsk: NO OOO BIP-S, 2002.S. 303.

2. Gusev V.S. Google: Search Effectively. Quick Start Guide. M., 2006.

3. Informatics for lawyers and economists. / Edited by S. V. Simonovich. SPb .: Peter, 2001.

4. Informatics. Basic course. Textbook for universities, St. Petersburg, 2001

5. Computer technologies in legal activity. / Edited by Professor N. Polevoy. M .: Publishing house BEK, 1994.

6. Pickles M.M. Information law. - M.M .: Jurist, 1999.-321s.

7. Encyclopedia of the Internet, St. Petersburg, 2001

8. How the browsers compare // http: //www.microsoft.com

The data collection methods used differ depending on the type of research being conducted - primary or secondary.

In the case of conducting secondary marketing research, methods of searching the Internet for the necessary information come to the fore. The main tools for finding it today are search engines and catalogs. In a number of cases, when their use does not give sufficient effect, a "manual" search is used on thematic sites, "yellow pages" and a number of other resources. Search methods are discussed in more detail in a later section of this chapter.

In the case of collecting primary information, the main methods of data collection are Internet surveys, observation and experiments.

Internet polls - the most widespread method of conducting them is a questionnaire survey. The questionnaire is a set of questions to which the answers of the respondents, that is, the persons selected for the questionnaire, must be received. Due to the fact that this tool is very flexible and versatile, it is the most common means of collecting primary data;

In fig. 3 shows a fragment of the questionnaire, which is invited to fill out users who decided to use free service e-mail on the site

Rice. 3.

Just as with traditional types of questionnaires, before each Internet research, it is necessary to carefully develop and test the questionnaires used in it. An unprofessional approach to their compilation inevitably leads to a distortion of the real picture, or the results obtained do not lend themselves to reasonable interpretation.

This type of research is devoted to one of the further sections of this chapter entitled "Conducting Internet Surveys".

Observation is a form of marketing research, with the help of which a systematic, systematic study of the behavior of an object or subject is carried out. Observation, unlike polling, does not depend on the readiness of the observed object to communicate information and is a process of open or hidden collection and registration of events or special moments associated with its behavior. Objects of observation can be, for example, characteristics and behavior of buyers;

This method includes marketing research conducted by firms with their own web server. They consist in the collection and subsequent analysis of data obtained from the log files of the web server or through the use of technologies using cookies... This data may relate to the behavior of visitors, the order of their transitions on the pages or statistics of visits to the web server. If a search engine is posted on the website, queries entered by users may additionally be collected and analyzed.

The ability to analyze statistics of server visits is one of the most effective marketing tools. Unlike conducting surveys that require the active participation of respondents, analyzing statistics allows you to collect valuable information without attracting visitors to action.

Experiment - from a scientific point of view, the most rigorous is an experimental study aimed at establishing cause-and-effect relationships. Experimental subjects should be specially selected and subjected to planned influences in conditions of control over the external environment in order to reveal statistically significant differences in their reactions. To the extent that researchers manage to "cut off" or take control of irrelevant external factors, the observed effects can be correlated with the effects of experimenters on the object. The connections established in this way between events after their critical analysis can be considered causal, and the goals of the experiment - achieved.

When we need to find the information we need on the Internet, we have several ways to achieve what we want. Most people use search engines, typing in the required query and studying the provided search results. For certain purposes, one cannot do without the use of specialized databases (DB) or directory sites.

How you look for the information you need depends only on your desire and determines how much information you have to process before finding what you are looking for, and how long it will take.

Let's briefly consider several search methods:

1. If you decide to use a search engine to get general information about something, then the results of millions of pages and numerous clicks on links should not scare you. But if your goal is to find specific information, then problems can arise. This method does not guarantee the accuracy of the information and is time consuming.

On the other hand, most search engines, such as Yandex and Google, allow you to narrow your search. First, you can use the advanced search filters. With the help of these filters, you can select the region you need, limit search results by the date the documents were updated, define the document language, and much more. Secondly, in the same Yandex, there is a "query language". Its essence is that to limit the search scope, you can use special operators that allow you to:

Get in the search results only documents containing the requested word in the given form

Clarify the presence and relative position of the requested words in the document

Limit search by file type, host, etc.

2. If you need to find information on a specific topic, then a directory search will do. On such sites, information is systematized and structured, broken down by topics and subtopics, which makes it easier to find the section you need. These sites are edited by real people, so most often the links provided there can be trusted. Of course, this method of searching, if you need to find a specific document, is not very effective, but it works well in cases where you need to find as much information as possible on a broad topic.

There is a huge number of directories on the Internet, both specialized, devoted to one broad topic, and multidisciplinary.

As an example, consider two large, multidisciplinary catalogs. DMOZ is one of the largest directory of resources on the Internet. If we consider purely Russian-language directories, then we can single out list.mail.ru. Below we will tell you a little more about these resources.

3. Database search is effective for thematic searches. Collecting the information we need in foreign and Russian-language information, educational and scientific resources very often requires a lot of effort and can cost a significant investment of money and time.

There is a huge number of databases on the Web - factual, bibliographic, full-text, objectographic, etc., which, depending on the content of the information stored in them, can be subdivided into universal, industry-specific and thematic. For example, bibliographic databases are inherently electronic counterparts to traditional printed bibliographic publications. Bibliographic databases, as a rule, support a once specified, clear algorithm for describing documents according to certain criteria. This helps to find the necessary information, especially if the task is to find any specific publication published in a periodical.


Example. Directories.

A directory of Internet sites, or a directory of Internet resources, or simply an Internet directory (English webdirectory) is a structured set of links to sites with a brief description of them. Sites within the directory are broken down by topic.

1. OpenDirectoryProject (ODP, Open Directory (ODP)), also known as DMOZ (from one of its first domain names directory.mozilla.org) - multilingual free directory of links to sites world wide web powered by an online community of volunteer editors.

On this moment on the site you can find:

3,884,779 sites

Information is available in 90 languages, including Russian, German, English, Greek, French, Japanese, Korean, Italian and so on.

The site has 91,790 editors.

The main categories are: art, business, computers, games, health, home, news, leisure, reference books, regions, science, shopping, society, sports, catalog for children and adolescents. Each of these main categories is subdivided into a huge number of subtopics.

DMOZ is operated by AOL Inc. (an American media conglomerate, provider of online services and message boards). Governance is handled by a small team of professionals responsible for editorial policy and leadership, community governance and development, and systems engineering.

However, DMOZ is primarily a self-regulatory community. Through a self-governing system, volunteer editors manage directory growth and development, while checks and balances ensure high quality content.

DMOZ is an open source volunteer initiative. AOL Inc. Manages it more like a non-profit organization and seeks to preserve its atmosphere of open and free resource.

2. If we talk about Russian-language catalogs, we can single out [email protected]. By analogy with DMOZ, there are main topics here, which are then broken down into smaller ones. There are 18 main categories in total: cars, the Internet, medicine and health, news and media, manufacturing, business and finance, computers, science and education, sports and so on.

For the convenience of users, inside each section, you can see a breakdown of all links available on this topic by:

Types of sites (information, corporate, personal, service sites, private, information service)

You can also sort the search results alphabetically, date and popularity.

Every day [email protected] provides the top most visited sites for the day from those that are available in the catalog.


Example. Database.

Espacenet (previously often referred to as [email protected]) it's free online service to search for patents and patent applications. Espacenet is developed by the European Patent Office (EPO) in cooperation with the member states of the European Patent Organization. Most of the Member States are able to use Espacenet in their national language and have access to the EPO Worldwide Database, most of which on English language... In 2015, Espacenet claimed to have registered records of over 90 million patent publications.

The Espacenet project was launched for the first time in 1998, revolutionizing international patent information by making it available to the public, and thus forever changing the way patents are distributed, checked and searched.

In 2012, the EPO launched the PatentTranslate project, a free online service for the automatic translation of patents. The service was created in partnership with Google and was "purpose-built to handle complex patent vocabulary." PatentTranslate covers 31 languages.

Since March 2016, Espacenet has started offering full text search across a database of English, French and German patent documents.

Example. Search engine.

The search engine is computer system, designed to search for information. One of the most famous uses of search engines is web services for finding textual or graphic information on the World Wide Web.

To search for information using a search engine, the user formulates a search query. The job of a search engine is to find documents that contain either the specified keywords or words that are somehow related to the keywords at the user's request.

Search engine architecture typically includes:

Search robot collecting information from Internet sites or from other documents,

Indexer providing quick search according to the accumulated information, and

Search engine is a graphical interface for user work.

At the moment, users have a wide choice of what kind of search engine they want to use to find the information they need: Google, Bing, Yahoo !, Yandex, Rambler and so on.

We'll take the Bing search engine as an example. Bing is a relatively young search service that first announced its existence in 2009 and was introduced by Microsoft CEO Steve Ballmer. However, despite its youth, this search engine is already second only to the search giant Google in its popularity in some European countries, as well as in North America.

Bing is most popular in countries such as the United States, China, Germany, India and the United Kingdom.

The search engine has a laconic appearance, and to find information, you can also use the categories "Pictures", "Videos", "Maps", "News". In addition, Bing has its own translator based on MicrosoftTranslator. The peculiarities of this search engine include the fact that right in the search engine, without installing on a computer, you can use licensed Microsoft Office products.

If you evaluate this site as a source of information, then you may come across one feature that complicates a quick and accurate search. Peculiarity search algorithm Bing is its relationship to keyword density. If for successful promotion in other search engines, site texts must contain from 5 to 8% of keywords, then Bing considers the natural density of keywords to be 3%. Thus, the likelihood that your request may be provided links to sites that do not contain the necessary information increases.

Like its competitors, Bing has the ability to filter search results by time period, language, and region.


CONCLUSION

The Internet forms a gigantic repository of data on all branches of human knowledge. Virtual libraries, archives, news feeds contain a huge amount of text, graphic, audio and video files - the worldwide computer network provides us with a huge amount of open information. And to navigate this information flow, to find what you need is very important for a person of the 21st century.

We got acquainted with such types of open sources of information on the Internet as catalogs, databases and search engines on specific examples and examined how information is searched within the framework of these resources.

Sources of

1) Article "Collecting information from open sources" [Electronic resource] // Vsepoisk, 2015, URL: http://www.vsepoisk.ru/2009/03/blog-post_27.html (Date of access: 13.12.2016)

2) Official website of DMOZ [Electronic resource] // DMOZ, 2016, URL: http://www.dmoz.org/ (Date of access: 13.12.2016)

3) Official site Catalog @ Mail [Electronic resource] // Mail.ru, 2016, URL: http://list.mail.ru/ (Date of access: 13.12.2016)

4) The official website of Espacenet [Electronic resource] // Espacenet, 2015 URL: http://www.epo.org/index.html (Date of access: 13.12.2016)

5) The article "We are looking in databases" [Electronic resource] // Vsepoisk, 2015, URL: http://www.vsepoisk.ru/2009/04/blog-post_08.html (Date of access: 13.12.2016)

2. Registration of the database

Before performing any operations with the IBExpert application existing base data it must be registered. To do this, either use the "Database> Register database" menu command, or select the "Register after creation" option in the database creation window.

As a result, the "Database parameters" dialog box (Fig. 4) opens, in which you need to fill in almost the same fields as when creating the database, then click the button.

To check the correctness of the parameters entered in the database registration window, press the [Check connection] button. This will open the CommunicationDiagnostics dialog box with the TestResults field containing the connection results. There will also be information about the version of the used DBMS.

After registration, all information entered about the database is saved by the IBExpert application and a node with the registered database is added to the "DatabaseExplorer" window on the "Databases" tab (Fig. 5) *.

To connect to a registered database, select the required database in the list (Fig. 5) and execute the "Database> Connect to Database" command, or double-click on the selected database.

If all the connection parameters have been entered correctly, a connection to the database will occur, the name of the connected database in the "DatabaseExplorer" window will be highlighted in bold, and nested nodes with objects contained in the connected database will appear

After connecting to the database, you can view existing objects, create new ones, enter and view data, and also perform operations with existing objects.

Procedure: Install VisualStudio on your computer. It is worth installing the extended kit, since the standard sql is not included. Open VisualStudio, select the "Tools" tab → "ConnecttoDatabase". In the window that appears, select the database type "MicrosoftSqlServerDatabasefile" → "Continue". In the window that appears, select the location for storing the database on the computer. Click "Ok". After that, the file of the created database will appear in the list of files on the right side of the screen. Double click on the database file. In the list that opens, right-click on "Tables" → "New". An empty table field appears. We fill in the table. The tables are filled in accordance with the previously created model. In fact, we transfer the model to sql. To create a new table, repeat step 5. One of the table fields must be set as key. To do this, select the required field (most often the key fields are fields containing id) and click on the key sign on the toolbar. In each line, you must select a data type. The data type determines how an object can store information in a given field. Once a certain type of data has been established, it will not be possible to enter data of a different type. If our string contains a text value, then these can be types CHAR (M), VARCHAR (M), TINYBLOB, TINYTEXT, BLOB, TEXT, MEDIUMBLOB, MEDIUMTEXT, LONGBLOB, LONGTEXT - depending on the estimated amount of memory that the string will store. If the data type is a number, then BOOLEAN, INTEGER, DECIMAL, FLOAT, REAL, DOUBLE, PRECISION will do. If you store the string will be data about time and date, then we use DATE, TIME, TIMESTAMP, DATETIME. Binary data can be of the types "Binary", "image", "varbinary". Other data types: "cursor", "hierarchyid", "sql_variant", "table", "timestamp", "uniqueidentifier", "xml", "Spatial types".



14.Using the Internet to Collect Sources. Scientific cooperation.

Modern approaches to the use of computer networks involve the implementation of information interaction between participants educational process in various modes of operation of the World Information Environment. Internet technology provides modern users with all the resources of global telecommunications, makes it possible to organize educational activities using applied and instrumental software tools and systems available to the modern user. In this regard promising direction is the development of scientific and pedagogical foundations for the creation and use of the Global information environment for lifelong education based on the creation of a Unified educational space, (information and subject environment) of a regional / global scale.

With all the variety of information and telecommunication technologies, as well as ways of organizing data when they are sent through communication channels, the world information computer network, the Internet, occupies a central place. Moreover, today it is practically the only global telecommunication network that is universally used in the general secondary education system. This is largely due to the high speed and reliability of data transmission over the Internet of various formats (text, graphic images, sound, video, etc.) The Internet provides an opportunity for collective access to educational materials, which can be presented both in the form of simple textbooks (electronic texts) and in the form of complex interactive systems, computer models, virtual learning environments, etc. etc. The number of users and sources of information on the Internet is constantly increasing. In addition, the quality of the telecommunication services provided is constantly improving.



Informatization is one of the main factors forcing education to improve. The content and methods of teaching are changing, the role of the teacher is changing, who is gradually transforming from a simple translator of knowledge into an organizer of trainees' activities to acquire new knowledge, skills and abilities. An essential means of informatization are educational information resources published on the Internet, which allow:

· Use of information posted on educational and scientific sites on the Internet (Web sites) for the preparation of educational and methodological materials. Abstracts and messages;

Organization of a representative office educational institution on the Internet;

· Creation of a website dedicated to the content of school discipline and its placement on the Internet;

· Hosting of personal websites of teachers and students.

The way to the huge information baggage of mankind, stored in libraries, music libraries, film libraries, lies through catalog cards. Similar mechanisms exist on the Internet to find the information you need. These are search engines that serve as a starting point for users. From a content point of view, we can speak of them as another special service of the Internet.

There are many catalogs and portals on the Internet that collect information that can be used by teachers. The use of such catalogs and information resources on the Internet is advisable for:

· Prompt provision of teachers, trainees and parents with relevant, timely and reliable information corresponding to the goals and content of education;

· Organization of various forms of trainees' activities related to independent mastery of knowledge;

Application of modern information and telecommunication technologies (multimedia technologies, virtual reality, hypertext and hypermedia technologies) in educational activities;

· Objective measurement, assessment and forecast of the effectiveness of training, comparison of the results of educational activities of schoolchildren with the requirements of the state educational standard;

· Management of the student's educational activity, adequately to his level of knowledge, abilities and skills, as well as the peculiarities of his motivation for learning;

· Creation of conditions for individual self-study of schoolchildren;

· Constant and operational communication between teachers, trainees and parents, aimed at increasing the effectiveness of training;

· Organizing the effective operation of general education institutions in accordance with the normative provisions and meaningful concepts adopted in the country.

A variety of information resources on the Internet may be appropriate for use in general secondary education. Among such resources, one can single out educational Internet portals, which themselves are catalogs of resources, service and instrumental computer software, electronic presentation of paper editions, electronic educational tools and means of measuring learning outcomes, resources containing news, announcements and means for communication of participants in the educational process.

Using the information resources of the Internet, teachers will be able to more effectively manage the cognitive activity of preschoolers, quickly track the results of education and upbringing, take reasonable and appropriate measures to improve the level of learning and the quality of knowledge of students, purposefully improve pedagogical skills, have prompt targeted access to the required educational information, methodological and organizational nature. Educators engaged in the development of their own information resources acquire an additional opportunity to use fragments of educational resources published on the network, making the necessary

Most of the highest quality information resources, the use of which would increase the efficiency of general secondary education, are cataloged on educational Internet portals. Currently, Russia has already developed an organizational scheme for creating a system of educational portals, which has its own characteristics. The organizational chart for creating a system of educational portals includes:

· Horizontal portal "Russian Education" (www.edu.ru),

Profile vertical portals by areas of knowledge: humanitarian, economic and social, natural science, engineering, pedagogical, medical, agricultural, etc.,

Specialized vertical portals: book publishing, unified exam, education news, etc.

The horizontal portal "Russian Education" provides:

· Navigation through all vertical portals;

· Search for multimedia information in the field of education on the Internet;

Personalization and personal adaptation of the interface, both by choosing a user's own category (learner, teacher, administrator, portal developer) and indicating the level of education, and by designing own interface;

· Formation and provision of cross-sections of vertical portals by levels of education;

· Storage and provision of information in the field of education (legislation, orders, regulations, standards, lists of specialties, a federal set of textbooks, a database of universities, etc.);

· Publishing a daily press review on education;

· News feed in the field of education;

· Organization of forums, discussion groups, mailing lists.

From the world of science, Demoscope, Scopus

INTRODUCTION

The Internet is like a huge world library, which has only one, but significant difference: to search for a book in the library there is a catalog, in extreme cases, you can turn to an experienced librarian. There is no complete catalog of the Internet. But, nevertheless, search in the global computer network is possible, and this, perhaps, is one of the most important aspects of it. To search for data in the network, special servers are used, the information on which is maintained and updated almost automatically.

Today, when the Internet has become one of the main sources of information, Internet search is gaining more and more practical value. But with the rapid increase in the amount of available data, the search procedure itself becomes more and more complicated.

The Internet is a global computer network that connects both computer network users and PC users. The Internet is slowly but surely becoming the main means of corporate communication, giving way so far to the telephone.

There is a huge amount of information resources on the Web. According to some estimates, the number of documents has exceeded 65 million and continues to grow rapidly. Such a volume of information requires the correct organization of the search process and the use of special technical means, such as search engines. A simple search for a fairly common keyword usually yields from tens of thousands to several million links. It is obvious that working with such a large number of documents is practically impossible, especially since the overwhelming majority of them contain information that is not relevant to the case.

Sources of information on the Internet differ in the way information is presented and, consequently, in the method of access to them.

1 SEARCHING TOOLS

1.1 File search tools

Finding the file manually in the complex directory structure of the ftp server can take a long time. To simplify and speed up the search, the Archie Internet search service was developed, which is a special Archie-server that stores the contents of directories of anonymous ftp-servers. When addressing a search query on an Archie server, the search result is a list of addresses of anonymous ftp servers on which the required file is located.

But the task arises to find the desired one among the set of files on this server, which is quite difficult due to the unclear and incomprehensible names of files and directories. To solve this problem, the Gopher system is used, which allows you to navigate a system of context menus, showing the contents of files using understandable notations. There are many Gopher servers that contain archives of data in the form of hierarchically structured directories, ordered by content. Working with them is very simple and corresponds to working with the usual display of the file system.

There is an extension of this system - Veronica, which contains directories of all Gopher-servers in its database. After entering a search query, Veronica automatically scans all Gopher directories for the information you are looking for, thereby eliminating the need to manually search many Gopher servers.

With this way of navigating, the Gopher was to a certain extent the forerunner of the WWW. Currently, the use of Gopher is decreasing in proportion to the increase in the use of the WWW.

1.2 WWW Tools - WorldWideWeb

In 1993, the WWW information retrieval system was developed, which, due to its ease of navigation and accessibility, opened information sources of the Internet to unprepared users. The WWW has triggered an Internet boom that continues to this day, and the amount of information available on the Internet is doubling each year.

WWW is based on the principle of hypertext (already familiar to the reader), that is, on a system of documents linked by hyperlinks. Hypertext is a special way of highlighting keywords from ordinary text. Hypertext links send the user to other documents on the same server or to other servers that may be located anywhere on the Internet. If this text document is also hypertext, then its links allow you to go further to the corresponding documents. Each redirection occurs imperceptibly for the user, so that he can view the information content of the Internet in a meaningful way, without worrying about addressing specific computers.

With the development of multimedia applications, initially purely hypertext documents are increasingly becoming hypermedia. Thus, WWW documents can exist in any data format: text, graphics, sound / music, or video clip. Orientation and navigation in the World Wide Web occurs using special programs called WWW-browsers that provide a user interface, such as NetscapeNavigator or MicrosoftInternetExplorer.

The starting point for searching for information is, as a rule, the main (base, home) page (site) of the information resource, which can be reached by entering the appropriate address in the browser (for example, http: //ncpi.gov.by or www.iparegistr.com). WWW-sites are created and updated by companies or special organizations that publish information and monitor the content of their WWW-pages. The use of the WWW, therefore, is not passive, and each Internet user with the help of special hypertext editor programs can independently create their own interactive WWW-pages. This paved the way for the growing commercialization and expansion of the Internet.

At present, the newly created information, as a rule, is made taking into account the need to provide WWW-access, and earlier documents are gradually being converted for it, but around the world there are still millions of files in forms other than WWW requirements. For the use of this information and through the WWW, the above Internet services are included in the browsers that provide access to it (telnet, ftp, Archie, Gopher). Through the WWW, you can also use other Internet services that are intended for communication (eMail, NetNews). Therefore, the WWW browser has now become a universal communication program for the Internet.

With the advent of the WWW service, the Internet boom began. This easy-to-use, user-friendly environment for all services has attracted the interest of many people and organizations on the Internet. It suddenly turned out that you didn’t need to be an Internet expert to use the services of the network. This can be compared to Microsoft's success in releasing Microsoft Windows as a graphical user interface. Before Windows appeared, each DOS application had its own user manual and thus required each application to be studied separately.

2 BASIC TECHNIQUES OF FINDING INFORMATION ON THE INTERNET

2.1 Basic requirements for search

Requirements are imposed on the search results for the completeness of the coverage of resources, the reliability of the information obtained, the minimum expenditure of time and the maximum search speed.

The requirement for the completeness of the coverage of resources does not need additional explanations, except for the need to use resources not only for the WWW, but also for other Internet services when searching.

The reliability of information, given the nature of the Internet, is becoming an extremely important requirement. The assessment of reliability can be carried out both by traditional methods (checking the legality of publications on paper, obtaining information about organizations and authors, finding out the validity of their electronic resources, etc.), and using the possibilities of the Internet (familiarization with alternative sources of information, reconciliation of factual material , establishing the frequency of its use by other sources; finding out the status of the document and the rating of the source by means of search engines, obtaining information about the competence and status of the author of the material using special Internet search services; analysis of individual elements of the site organization in order to assess the qualifications of the specialists supporting it, and more).

Search time, not counting the time spent on connection specifications, is largely dependent on the search planning and the search engineer's skill with the type of resource of the selected resource. Search planning is about determining the search services required to resolve a search request and the order in which they are applied. In addition, a lot depends on the skills and experience of the individual search specialist.

As already noted, information on the Internet is available from various types of sources. First of all, these are WWW-resources (hypertext system, resource catalogs, search engines). In addition, it is already known to the reader e-mail, mail robots, Usenet and other newsgroups, as well as ftp-systems and archives (using Gopher and Veronica). WWW allows you to search for the required resources based on its hyper properties, that is, existing search engines work using hyperlinks in an automatic mode, without excluding the possibility of manual browsing. The WWW has a number of general and specialized search services.