This solution is intended as a replacement for the search engine that was publicly available for about three years at http://146.148.124.49/. Similarly to that search engine, the target users of this solution are translators working in the EN-HU language combination (additional languages may become available later). Users can run linguistic (string-based) searches in large linguistic databases, and the results (and the source of each result) are returned as .html tables displayed in the default browser (see the demo video above).
Queries are run against two databases: ECJ_enhu.db, which contains judgements of the European Court of Justice in EN-HU bilingual form, and RefLex_enhu.db, which contains the DG-TRAD translation memories. At this point, the ECJ and RefLex databases contain about 1.9M and 5.6M segments, respectively, and they will be updated from time to time.
Results from each database are displayed in separate browser tabs. A maximum of 1,000 results per database are displayed for each query. Results from the ECJ database are sorted by CELEX number, so that the most recent judgements are displayed on top, while results from the RefLex database are sorted by the length of the source segment, so that the shortest results are displayed on top.
For each hit, the source document is specified by its CELEX number on the right side of the screen, and the source document can be opened on EurLex by clicking the provided link.
Users can also export entire documents from the databases as EN-HU translation memories in .tmx format by specifying the CELEX-number of the required document.
How to start using this searchThingy?
1. Download and install Resilio Sync: https://www.resilio.com/individuals/
Resilio Sync is a (somewhat) free application (well, its free for private use) that uses peer-to-peer technology to synchronize and transfer data between computers, and distributing large files among a number of users is a fairly typical user scenario. I maintain a master folder containing all files required to use the search thingy, and Resilio Sync synchronizes the content of this master folder with a user-selected folder on the user’s PC.
2. Create a target folder on your computer. All files will be downloaded to this folder. NOTE that the databases take up a significant amount of space, so make sure that you use a fairly large drive. Currently, the databases require 4+GB, and they are expected to grow even bigger with each update. They should not exceed 10GB in the foreseeable future.
3. Once you launch Resilio, you need to enter a key into the app (this identifies the master folder on my computer) and specify the target folder you just created on your computer to start syncing. The below key points to the master folder:
BYQ4TSVAKUCG2CVNP2QSMBN6WG2MS6T4M
Here is how you add the key:
(a) Click the large + sign in the top left corner
(b) Select the Enter a key or link option
(c) Enter the key and select the target folder
After some indexing, the download starts automatically. First, note that you are downloading large files, so the download may take some time. Second, remember that Resilio uses peer-to-peer technology, so each user both downloads and re-shares data from and to all other users connecting to the same master folder. This helps to increase your download speed, facilitates the distribution of the files, and makes sure that you have the latest version of the databases (as any update I make to the master databases is automatically distributed among other users).
When the download is finished, you can start using the search engine. Further instructions are provided in the #README.txt file that is downloaded to the user-specified folder.
Some caveats, suggestions, and pleadings
1. Due to the distribution technology used, the two databases are overwritten with each update. Do not import your own data into these databases, as your own input will be lost with the next update. If you want to enter and use your own data, make sure that you create a separate database for them.
2. The whole thing is portable and runs without installation. This means that you can copy the contents of your target folder onto a data stick, and you can carry it around in your briefcase. If you need to search in the databases, just copy everything from the stick onto a computer and you can search away.
3. To make sure that you have the latest version of the databases and the corresponding software, launch the Resilio app every once in a while (or even set it up to run on system startup).
4. When using the search solution for the first time after turning on your computer (or simply after a longer period of time), it may take a few minutes to return the hits. This is probably due to the way your computer manages memory, and the search results should be returned much faster from that point on.
5. A revolutionary feature of peer-to-peer technology is that it eliminates the need for a powerful and expensive infrastructure to disseminate large volumes of data, as each user’s PC (“node”) can download and share data from and to every other node. The more nodes participate in the sharing process, the faster the data transfer is and the less resources are required from each node. To help your peers getting the latest data as fast as possible (and to make sure that you yourself have the latest data and software), please consider keeping Resilio up and running as much as possible. Remember that